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Introduction 


Fact 1 


All lectures for this class are recorded ahead of time and watched asynchronously on Panopto (notably, the dates 


are all approximate in this document). The best part about this is that we can pause and rewind the lecture while 


taking notes, and information for how to contact course staff and attend office hours is on the course website. 


We'll start with a bit of explanation for what functional analysis aims to do. In some previous math classes, like 
calculus and linear algebra, the methods that we learn help us solve equations with finitely many variables. (For 
example, we might want to find the minimum or maximum value of a function whose inputs are in R”, or we might 
want to solve a set of linear equations.) This helps us solve a lot of problems, but then we come across ODEs, PDEs, 


minimization, and other problems, where the set of independent variables is not finite-dimensional anymore: 


Example 2 


If we consider a problem like “finding the shortest possible curve between two points,” this problem is specifying a 


functional, meaning that the input is a function. And we need infinitely many real numbers to specify a real-valued 
function f : [0,1] > R. 


So functional analysis helps us solve problems where the vector space is no longer finite-dimensional, and we'll see 


later on that this situation arises very naturally in many concrete problems. 
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We'll use a lot of terminology from real analysis and linear algebra, but we'll redefine a few terms just to make sure 
we're all on the same page. 
We'll start with normed spaces, which are the analog of R” for functional analysis. First, a reminder of the 


definition: 


Definition 3 


A vector space V over a field K (which we'll take to be either R or C) is a set of vectors which comes with 


an addition + : V x V > V and scalar multiplication - : K x V > V, along with some axioms: commutativity, 


associativity, identity, and inverse of addition, identity of multiplication, and distributivity. 


Example 4 


IR" and C" are vector spaces, and so is C([0, 1]), the space of continuous functions [0, 1] > C. (This last example 


is indeed a vector space because the sum of two continuous functions is continuous, and so is a scalar multiple of 


a continuous function.) 


But C((0, 1]) is a completely different size from the other vector spaces we mentioned above, and this is going back 
to the “finite-dimensional” versus “infinite-dimensional” idea that we started with. Let’s also make sure we remember 


the relevant definition here: 


Definition 5 
A vector space V is finite-dimensional if every linearly independent set is a finite set. In other words, for all sets 
E CV such that 


N 
y > aii 0 ay =a =-:-=a,=0 Vy: wee, 
(= 


E has a finite cardinality. V is infinite-dimensional if it is not finite-dimensional. 


We'll be dealing mostly with infinite-dimensional vector spaces in this class, and we're basically going to “solve 


linear equations” or “do calculus” on them. 


Example 6 


We can check that C([0, 1]) is infinite-dimensional, because the set 
Ei, Coe xe ane Zsolt 


is linearly independent but contains infinitely many elements. 


What we'll see is that facts like the Heine-Borel theorem for R” become false in infinite-dimensional spaces, so 
we'll need to develop some more machinery. 
In analysis, we needed a notion of “how close things are” to state a lot of results, and we did that with metrics on 


metric spaces. We'll try defining such a distance on our vector spaces now: 


Definition 7 

A norm on vector space V is a function || - || : V > [0, co) satisfying the following three properties: 
1. (Definiteness) ||v|| = 0 if and only if v =0, 
2. (Homogeneity) ||Av|| = |Al||v|| for all ve V and A EK, 


3. (Triangle inequality) ||vz + vo|| < ||vi|] + ||vol| for all vz, v2 € V. 


A seminorm is a function || - || : V — [0,00) which satisfies (2) and (3) but not necessarily (1), and a vector 


space equipped with a norm is called a normed space. 


We can indeed check that this is consistent with the definition of a metric d : X x X — [0,00), which has the 


following three conditions: 


1. (Identification) d(x, y) = 0 if and only if x = y, 


2. (Symmetry) d(x, y) = d(y,x) for all x,y € X, 
3. (Triangle inequality) d(x, y) + d(y, Zz) = d(x, z) for all x,y,z € X. 


Indeed, we can turn our norm into a metric (and thus think of our normed space as a metric space): 


Proposition 8 


Let || -|| be a norm on a vector space V. Then 


a(vw) = ||v || 


defines a metric on V, which we call the “metric induced by the norm.” 


Proof. We just need to check the three conditions above: property (1) of the norm implies property (1) of metrics, 
because 
d(v,w)=||jv-wl|=0 = v—-w=0 + vew. 


For property (2) of the metric, note that 
lv — wl] = ||(—1)(w — v)|] =| - 1+ [lw vi = ||w— vf, 


by using property (2) of the norm. And finally, property (3) of the metric is implied by property (3) of the norm 


because (x — y) + (y —z) = (x — 2). 


Example 9 


The Euclidean norm on R” or C", given by 


a 1/2 
IIx|l2 = (>: ii) ; 
=1 


is indeed a norm (this is the standard notion of “distance” that we're used to). But we can also define 


I|x|]oo = max |x| 
1<i<n 


(the “length” of a vector is the largest magnitude of any component), and more generally (for 1 < p < ov) 


n 1/p 
IIIb = (ssa ; 


i=1 


We can draw a picture of the “unit balls” in R* for the different norms we've defined above. Recall that B(x, r) is 


the set of points that are at most r away from x: under the norm ||-||2, B(O, 1) looks like a circle, but under the norm 


Il - lloo, B(O, 1) looks like a square with vertices at (41,+1), and under the norm || - ||1, it looks like a square with 
vertices at (0,1), (1,0), (0,—1), (—1,0). In general, the different || - ||, norms will give “unit balls’ that are between 
those two squares described above. 

So changing the norm does change the geometry of the balls, but not too drastically: if we take a large enough 
£* ball (that is, a ball B(0, r) with large enough r under the || - ||, norm), it will always swallow up an £°° ball of any 
fixed size. This “sandwiching” basically means that the norms are essentially equivalent, but we'll get to that later in 
the course. 


But we can now get to examples of norms on vector spaces that aren’t necessarily finite-dimensional: 


Definition 10 
Let X be a metric space. The vector space C,,(X) is defined as 


Coo(X) = {f :X > C: f continuous and bounded}. 


For example, C,([0, 1]) is C([0, 1]), because all continuous functions on [0, 1] are bounded. 


Proposition 11 


For any metric space X, we can define a norm on the vector space C,,(X) via 


I|U|]oo = sup |u(x)|. 
xXExX 


Proof. Properties (1) and (2) of a norm are clear from the definitions, and we can show property (3) as follows. If 


u,v € C,.(X), then for any x € X, we have 
Ju(x) + v(x) < Juex)| + lv0x)| 


by the triangle inequality for C, and this is at most ||u|| + ||v|| (because u(x) is bounded by its supremum, and so is 


v(x)). Thus, we indeed have 


Ju(x) + VOX) < [Tullo + [IVIloo Vx EX => |lU+ Vlloo = sup |u(x) + V(x)] < [Tullo + [IVI Too- 


And now that we have a norm, we can think about convergence in that norm: we have u, + u in C.(X) 
(convergence of the sequence) if 


lim ||Up — Ullo =9, 
n-oo 
which we can unpack in more familiar analysis terms as 


Ve >0, INEN: Vn>N,Vx €X, |un(x) — u(x)| < €, 


which is the definition of uniform convergence on X. So convergence in this metric (we'll use metric and norm 
interchangeably, since the metric is induced by the norm) is really a statement of uniform convergence when we have 
bounded, continuous functions. 


Let’s now write down a few more examples of normed vector spaces: 


Definition 12 


The £° space is the space of (infinite) sequences 
£? = {{aj}j21 : llallp < co}, 


where we define the 2? norm 


Cee jail)? l<p<o 


SUP1 <j<oo |aj| p=O. 


llallo = 


Example 13 


co 


The sequence {3} is in 2° for all p > 1 but not in é' (by the usual p-series test). 
j=l 


Checking that the triangle inequality holds in this space (or even in the finite-dimensional case) is nontrivial, so it’s 
not clear that we necessarily have a normed vector space @? here! But it'll be in the exercises for us to work out the 
details. 

And now we can talk about the central objects in functional analysis that we're really interested in, which are the 


analogs of R” and C” in that they're complete (Cauchy sequences always converge). 


Definition 14 


A normed space is a Banach space if it is complete with respect to the metric induced by the norm. 


We've learned in real analysis that Q is not complete, because we can construct a sequence of rationals that 


converge to an irrational number. So R “fills in the holes,” and we want that property for our Banach spaces. 


Example 15 


For any n € Zso, R” and C” are complete with respect to any of the |] - ||» norms. 


Theorem 16 


For any metric space X, the space of bounded, continuous functions on X is complete, and thus C,.(X) is a 


Banach space. 


Proof. We want to show that every Cauchy sequence {u,} converges, meaning that it has some limit u in C.(X). 
This proof basically illustrates how we prove that spaces are Banach in general: take a Cauchy sequence, come up 
with a candidate for the limit, and show that (1) this candidate is in the space and (2) convergence does occur. 

So if we have our Cauchy sequence {u,}, first we show that it is bounded under the norm C,,(X). To see this, 


note that there exists some positive integer No such that for all n,m > No, 
\|Un — Umlloo < 1. 


So now for all n > No, 


|Unlloo S |[Un — UNolloo + |]UiNolloo < 1+ || UN |loo 
by the triangle inequality, and thus for all n € N, we have 
lUnlloo S ||Urlloo + +++ + [Unolloo + 1 


(because we need to make sure the first few terms are also small enough). So we can bound ||Up||oo by some finite 
positive B, and thus we have a bounded sequence in the space C,,(X). 


So now if we focus on a particular x € X, we have 
| Un(x) _ Um(X)| < sup | Un(x) ~~ Um(X)| = || Un = ues 
x 


and because {up} is Cauchy, for any x € X, the sequence of complex numbers {u,(x)} (where we evaluate each 


function up at the fixed x) is a Cauchy sequence. But the space of complex numbers is a complete metric space, so 


for all x € X, up(x) converges to some limit, which will help us define our candidate function: 
u(x) = lim up(x). 
noo 
This is basically the pointwise limit, and we now need to show this is in C..(X) and that we have convergence under 
the uniform convergence norm. Now we know that 
Ju(x)| = lim [un(x)| 
noo 


(if the limit exists, so does the limit of the absolute values), and now we know that the right-hand side is bounded by 


the |] - |],. norm, and thus by the B that we found above. That means that 


sup |u(x)| < B, 
xXEX 


so u Is indeed a bounded function. To finish the proof, we'll show continuity and convergence, which we'll do with 
the usual definition. Fix € > 0; since {up} is Cauchy, there exists some N such that for all n,m > N, we have 


I|Un — Umlloo < §. So now for any x € X, we have 
€ 
[uin(x) = Um(x)] S [[Un = Umlloo < 5: 


so taking the limit as m— oo, we have that for all n> N, 


|Un(x) — u(x)| S 


ORw) 


(everything is still pointwise at a point x here). So it’s also true that sup, |Un(x) — u(x)| < 5 < €, and thus 


[|Un — Ulloo + O. And now because ||Un — Ul|oo 4 0, we know that u, — u uniformly on X, and the uniform limit of 


a sequence of continuous functions is continuous. Therefore, our candidate u is in C.(X) and is the limit of the ups, 


and thus C,,(X) is complete and a Banach space. 


This proof is a bit weird the first time we see it, but we can think about how to apply this proof to the £? space 


(it will look very similar). And we can also try using this technique to show that the space 
oo = {ae ©: lim a; = 0} 
joo 


is Banach. An important idea is that the “points” in our spaces are now sequences and functions instead of numbers, 


which is making some of the argument more complicated than in the real-number case! 
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We'll continue our discussion of Banach spaces today. If V is a normed space, we can check that whether V is Banach 


by taking a Cauchy sequence is seeing whether it converges in V. But there’s an alternate way of thinking about this: 


Definition 17 


Let {vn}%2 be a sequence of points in V. Then the series >, vp is summable if {77 Vivo converges, and 


Yn Vn is absolutely summable if {377"; ||val|}>_, converges. 


This is basically analogous to the definitions of convergence and absolute convergence for series for real numbers, 


and we have a similar result as well: 


Proposition 18 


If >, Vn is absolutely summable, then the sequence of partial sums eae Vn} is Cauchy. 


co 
m=1 


This proof is left to us as an exercise (it’s the same proof as when V = R), and we should note that the theorem is 
that we have a Cauchy sequence, not necessarily that it is summable (like in the real-valued case). And that’s because 


we need completeness, and that leads to our next result: 


Theorem 19 


A normed vector space V is a Banach space if and only if every absolutely summable series is summable. 


This is sometimes an easier property to verify than going through the Cauchy business — in particular, it'll be useful 


in integration theory later on. 


Proof. We need to prove both directions. For the forward direction, suppose that V is Banach. Then V is complete, 
so any absolutely summable series is Cauchy and thus convergent in V (that is, summable). 

For the opposite direction, suppose that every absolutely summable series is summable. Then for any Cauchy 
sequence {vp}, let's first show that we can find a convergent subsequence. (This will imply that the whole sequence 
converges by a triangle-inequality metric space argument.) 

To construct this subsequence, we basically “speed up the Cauchy-ness of {v,}.” We know that for all k € N, there 


exists N,; € N such that for all n,m > N;, we have 
[Yn = Vml| < 2-*. 
(We're choosing 2~* because it's summable.) So now we define 
Nk = Ny +-+++ Nx, 


SO Ny < Mo < ng <--+ IS an increasing sequence of integers, and for all k, me > Nx. And now we claim that {vp, } 
converge: after all, 


Vigra _ Vig | | < oe 


(because of how we choose ng and +1), and therefore the series 


So nt = Vix) 


keN 


must be summable (it’s absolutely summable because >.<) 2~* = 1, and we assumed that all absolutely summable 


sequences are summable). Thus the sequence of partial sums 


m 
, (Vrpe a Vin) = Vimar — Ym 
k=1 


converges in V, and adding v,, to every term does not change convergence. Thus the sequence {Vp,,., }°°_1 converges, 
and we've found our convergent subsequence (meaning that the whole sequence indeed converges). This proves that 
V is Banach. 


Now that we've appropriately characterized our vector spaces, we want to find the analog of matrices from linear 
algebra, which will lead us to operators and functionals. Here's a particular example to keep in mind (because it 


motivates a lot of the machinery that we'll be using): 


Example 20 


Let K : [0,1] x [0,1] > C be a continuous function. Then for any function f € C([0, 1]), we can define 


T(x) = i K(x. y)Fy)dy. 


The map T is basically the inverse operators of differential operators, but we'll see that later on. 


We can check that Tf € C([0, 1]) (it’s also continuous), and for any 1, A2 € C and fi, fo € C([0, 1]), we have 
T(Aift + Agfa) = ATA + AT he 


(linearity). We've already proven that C([0,1]) is a Banach space, so T here is going to be an example of a linear 


operator. 


Definition 21 
Let V and W be two vector spaces. A map T : V > Wis linear if for all Ay, Ax € K and vy, v2 € V, 


Ti V+ d2V2) = Ail vy + Aol vo. 


(We'll often use the phrase linear operator instead of “linear map” or “linear transformation.”) 


We'll be particularly curious about linear operators that are continuous: recall that a map T : V > W (not 
necessarily linear) is continuous on V if for all v € V and all sequences {v,} converging to v, we have Tvp > Tv. 


(Equivalently, we can use the topological notion of continuity and say that for all open sets U C W, the inverse image 
TU) ={veV:TveU} 


is open in V.) For linear maps, there's a way of characterizing whether a function is continuous on a normed space — 
in finite-dimensional vector spaces, all linear transformations are continuous, but this is not always true when we 


have a map between two Banach spaces. 


Theorem 22 


Let V,W be two normed vector spaces. A linear operator T : V — W is continuous if and only if there exists 


C > 0 such that for all v € V, ||Tv|lw < Cl[v|lv. 


In this case, we say that TJ is a bounded linear operator, but that doesn’t mean the image of 7 is bounded 
— the only such linear map is the zero map! Instead, we're saying that bounded subsets of V are always sent to 
bounded subsets of W. 


Proof. First, suppose that such a C > 0 exists (such that ||Tv||w < C||v|ly for all v € V): we will prove continuity 
by showing that Tv, — Tv for all {v,} > v. Start with a convergent subsequence v, > v: then 


IT Vn — Tv|lw = [IT (vn — v)IIw 
(by linearity of 7), and now by our assumption, this can be bounded as 


< C|]Vn — vilv. 


Since ||V, — v||y — 0, the squeeze theorem tells us that ||7V_, — Tv||w — 0 (since the norm is always nonnegative), 
and thus Tv, > Tv. 
For the other direction, suppose that T is continuous. This time we'll describe continuity with the topological 


characterization: the inverse of every open set in W is an open set In V, so in particular, the set 
T~'(Bw(0,1)) ={veEV: Tve By(0,1)} 


is an open set in V. Since 0 is contained in By(0,1), and T(0) = 0, we must have 0 € T~+(By/(0,1)), and (by 
openness) we can find a ball of some radius r > 0 so that By(0,r) is contained inside T~*(By/(0,1)). This means 
that the image of By (0, r) is contained inside By,(0, 1). 

Now, we claim we can take C = 2. To show this, for any v € V — {0} (the case v = 0 automatically satisfies the 


inequality), we have the vector Tw“ which has length 5 < r. This means that 


P 
2\lvllv 


v € By(0,r) i Gaur v) By/(0, 1) 


(because By (0, r) is all sent within By(0,1) under 7), and thus 


: 2 
— <1 == ||T(v)\lw < =IIvllv 
idee 


by taking scalars out of T and using homogeneity of the norm, and we're done. 


The “boundedness property” above will become tedious to write down, so we won't use the subscripts from now 
on. (But we should be able to track which space we're thinking about just by thinking about domains and codomains 


of our operators.) 


Example 23 
The linear operator T : C([0,1]) — C([0,1]) in Example 20 is indeed a bounded linear operator (and thus 


continuous). 


We should be able to check that T is linear in f easily (because constants come out of the integral). To check 


that it is bounded, recall that we’re using the C,, norm, so if we have a function f € C([0, 1]), 
I|Flloo = sup |F(x)| 
x€[0,1] 


(and this supremum value will actually be attained somewhere, but that’s not important). We can then estimate the 
norm of Tf by noting that for all x € [0, 1], 


TO) =| y Kix vidy| < [ ike) ironiay 


by the triangle inequality, and now we can bound f and K by their supremum (over [0, 1] and [0, 1] x [0, 1], respectively) 
to get 


1 1 
< | K(x, y)| Flay < f A(x, YDII IFlloody = I]K(x YDIT II floc: 


Since this bound holds for all x, it holds for the supremum also, and thus 


IIT FIlx S [Koo II Flo 


and we can use C = ||K||,. to show boundedness (and thus continuity). We will often refer to K as a kernel. 


Definition 24 


Let V and W be two normed spaces. The set of bounded linear operators from V to W is denoted B(V,W). 


We can check that B(V,W) is a vector space — the sum of two linear operators is a linear operator, and so on. 


Furthermore, we can put a norm on this space: 


Definition 25 
The operator norm of an operator T € B(V,W) is defined by 


[IT|]= sup |[T vl. 


\[vJ=1veVv 


This is indeed a finite number, because being bounded implies that 
|Tv] < CllvJ=C 


whenever ||v|| = 1, and the operator norm is the smallest such C possible. 


Theorem 26 


The operator norm is a norm, which means B(V,W) is a normed space. 


Proof. First, we show definiteness. The zero operator indeed has norm 0 (because ||7 v|| = 0 for all v). On the other 
hand, suppose that Tv = 0 for all ||v|| = 1. Then rescaling tells us that 0 = Tv’ = |[v’||T (ior) = 0 for all v’ £0, 
so T is indeed the zero operator. 


Next, we can show homogeneity, which follows from the homogeneity of the norm on W. We have 


|AT II] = sup Aes [AIT vl, 


I|v||=1 Iv\|= 


and now we can pull the nonnegative constant |A| out of the supremum to get 


=A sup ||T vl] =] IAT} 


I|v||=1 


Finally, the triangle inequality also follows from the triangle inequality on W: if S,7T € B(V,W), and we have some 
element v € V with |{v|| = 1, then 


(S$ + T)vl] =[lSv+ Tvl < [Swi] + TIF vil s USt + ITT 


So taking the supremum of the left-hand side over all unit-length v gives us ||S+T7|| < ||S||-+||7|], and we're done. 


For example, if we return to the operator T from Example 20, we notice that for any f of unit length, we have 
IIT Flloo < ||K] loo. 
Therefore, ||7|| < ||K||. And in general, now that we've defined the operator norm, it gives us a bound of the form 
IF Gay)|| <u = wrens tmiia 


for all v € V (not just those with unit length). 


Since we have a normed vector space, it’s natural to ask for completeness, which we get in the following way: 


10 


Theorem 27 


If V is a normed vector space and W is a Banach space, then B(V,W) is a Banach space. 


Proof. We'll use the characterization given in Theorem 19. Suppose that {7,,} is a sequence of bounded linear 


operators in B(V,W) such that 
C= UMITall < 00. 
n 


(In other words, we have an absolutely summable series of linear operators.) Then we need to show that )°, T,, is 
summable, and we'll do this in a similar way to how we showed that the space C,,(X) was Banach: we'll come up 
with a bounded linear operator and show that we have convergence in the operator norm. 


Our candidate will be obtained as follows: for any v € V and m EN, we know that 


m m 
Sol Tavil < SoU Tall vil < vil $2 Tall = CllvIl- 
n=1 n=1 n 


Thus, the sequence of partial sums of nonnegative real numbers >)", ||Tnv|| is bounded and thus convergent. Since 
Tnv € W for each n, we've shown that a series >r Tnv is absolutely summable in W, and thus (because W is Banach) 


ye Tnv is summable as well. So we can define the “sum of the 7,5,” T : V > W, by defining 


m 
Tv= lim y Thv 
m—->oco 
n=1 


(because this limit does indeed exist). We now need to show that this candidate is a bounded linear operator. 


Linearity follows because for all Az, A> € K and wy, vo EV, 
m 
TAM + Aav2) = lim So Ta(Aiva + Azva), 
m—->oo n= 


and now because each T,, Is linear, this is 


m m 
= lim Ay So Tu +d2 50 Tw. 


n=1 n=1 


Now each of the sums converge as we want, since the sum of the limits is the limit of the sums: 
=AyTvVy+AoT vo. 


(The proof that the sum of two convergent sequences also converges to the sum of the limits is the same as it is in 
R, except that we replace absolute values with norms.) 


Next, to prove that this linear operator T is bounded, consider any v € V. Then 


m 
\|7v|| = }} lim s Thv 
m->oo 
n=1 
and limits and norms interchange, so this is also 


m m 
= do Tav|] < lim YOU Tavll 
n=1 n=1 


lim 
m—->oo 


11 


by the triangle inequality. But now this is bounded by 
m 
< SO [Tall vil = Clivil, 
n=1 


where C is finite by assumption (because we have an absolutely summable series). So we've verified that T is a 
bounded linear operator in B(V, W). 

It remains to show that 5°)”, T, actually converges to T in the operator norm (as m—+ oo). If we consider some 
v € V with ||v|| = 1, then 


m m m m 
Tv- s Tnvi| |= 1} lim s Tnv— s Tnv||,= |} lim s Tav 
m!—0o m!—o0o 
n=1 n=1 n=1 n=m+1 


and now we can bring the norm inside the limit and then use the triangle inequality to get 


mn rn 
< lim S> [|Tavil< lim | S° | ITall 
m/—oo m!—oo 

n=m+1 n=m-+1 


(because v has unit length). And now this is a series of nonnegative real numbers 


co 


= SO Tall. 


n=m+1 


and thus we note that (taking the supremum over all unit-length v) 


m oo 
T-S OTA S< SS IITall +0 
n=1 n=m+1 


because we have the tail of a convergent series of real numbers. So indeed we have convergence in the operator norm 


as desired. 


Definition 28 
Let V be a normed vector space (over K). Then V’ = B(V,K) is called the dual space of V, and because 


K = R,C are both complete, V’ is then a Banach space by Theorem 27. An element of the dual space B(V, K) 


is called a functional. 


We can actually identify the dual space for all of the 2? spaces: it turns out that 
(ery =e", 


where p, p’ satisfy the relation : + I = 1. So the dual of £1 is 2°, and the dual of £7 is itself (this is the only £?° 


space for which this is true), but the dual of 2° is not actually ei. (Life would be a lot easier if this were true, and 


this headache will come up in the L? spaces as well.) 


3. February 23, 2021 


Last time, we introduced the space of bounded linear operators between two normed spaces, B(V,W), and we proved 


that this space is Banach when W is Banach. Today, we'll start seeing other ways to get normed spaces from other 
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normed spaces, namely subspaces and quotients. 


We should recall this definition from linear algebra: 


Definition 29 
Let V be a vector space. A subset W C V is a subspace of V if for all wy, wo € W and Ai, Ao € K, we have 


A1W, + A2W2 € W (that is, closure under linear combinations). 


Proposition 30 
A subspace W of a Banach space V is Banach (with norm inherited from V) if and only if W is a closed subset 


of V (with respect to the metric induced by the norm). 


Proof sketch. If W is Banach, the idea is to show that every sequence of elements in W converges (to something 
in V) actually converges in W, and we show this by noticing that the sequence must be Cauchy, meaning that (by 
completeness of W) there is a convergence point, and then we use uniqueness of limits. 

For the other direction, if W is closed, then any Cauchy sequence in W is also a Cauchy sequence in V, so it has 


a limit. Closedness tells us that the limit is in W, so every Cauchy sequence has a limit in W, which proves that it is 


Banach. 


Definition 31 
Let W CV bea subspace of V. Define the equivalence relation on V via 


vav’ = v-vewWw 


and let [v] be the equivalence class of v (the set of v’ € V such that v ~ v’). Then the quotient space V/W is 


the set of all equivalence classes {[v] :v € V}. 


We can check that the usual conditions for an equivalence relation are satisfied: 

* Reflexivity: v ~ v for all v € V (because 0 € W) 

+ Symmetry: v ~ v’ if and only if v’ ~ v (because we W => —weW). 

* Transitivity: if v~ v’ and v’ ~ v”, then v ~ v” (because of closure under addition in W). 


We will typically denote [v] as v+W (using the algebra coset notation), since all elements in the equivalence class 


of v are v plus some element of W. And with this notation, we have (for any v1, vo € V) 
(vy, +W) + (vo +W) = (yy + vo) + W, 
and (for any  € K) 
A(v+W) =AvV+W. 


We do need to check that these operations are well-defined (that is, the resulting equivalence class of the operations 
is independent of the representative of v + W), but that’s something that we checked in linear algebra (or can check 
on our own). We typically pronounce V/W “V mod W,” and in particular W = 0+W =w-+W for any w EW. 

We introduced the concept of a seminorm when we defined a normed vector space — basically, seminorms satisfy 


all of the same assumptions as norms except definiteness (so nonzero vectors can have seminorm 0). 
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Example 32 


Consider the norm which assigns the real number sup|f’| to a function f: this satisfies homogeneity and the 


triangle inequality, but it is not a norm because the derivative of any constant function is 0. 


But the constant functions form a subspace, and this next result is basically talking about how we can mod out by 


that subspace: 


Theorem 33 
Let || - || be a seminorm on a vector space V. If we define E = {v € V: ||v|| = O}, then E is a subspace of V, 
and the function on V/E defined by 


lv + Ellvve = IlvIl 


for any v+ E € V/E defines a norm. 


Proof. First of all, E is a subspace because (by homogeneity and the triangle inequality) 
|Arvi + Azvel| < Aa||val] + Azl [vol] = 0 


for any v1, V2 € E and Xj, A2 € K, and because a seminorm is always nonnegative, we must have ||Aiv, + A2ve|| = 0 
(and thus Ayv + Azvo € E. 

This means that V/E is indeed a valid quotient space, and now we must show that our function is well-defined (in 
other words, that it doesn’t depend on the representative from our equivalence class). Formally, that means that if 
we need to check that if v+ E = v’ + &, then |{v|| = ||v’||. And we can do this with the triangle inequality: since 
v+E=v'+E, there exists some e € E such that v=v’ + e, 


IIvll = Il’ + ell < [lvl + Tlell = IIv'll 


by the triangle inequality. But this argument is also true if we swap the roles of v and v’, so it’s also true that 
\|v’|| < ||v||, and thus their seminorms must actually be equal. 
Checking that this function is actually a norm on V/E is now left as an exercise to us: the properties of homogeneity 


and triangle inequality follow because || - || is already a seminorm, and definiteness comes because everything that 


evaluates to 0 is in the equivalence class 0+ E. 


So identifying the subspace of all zero-norm elements gives us a normed space, but we can also start with a normed 
space V and consider some closed subset W of that normed space. Then V/W is a new normed space — that will be 
left as an exercise for us. 

With that, we've concluded the “bare-bones” part of functional analysis, and we're now ready to get into some 
fundamental results related to Banach spaces. (In other words, the theorems will now have names attached to them, 


and we should be able to recognize the names.) First, we'll need a result from metric space theory: 


Theorem 34 (Baire Category Theorem) 


Let M be a complete metric space, and let {C,}, be a collection of closed subsets of M such that M = Unen Gr. 


Then at least one of the C, contain an open ball B(x, r) = {y € M: d(x, y) < r}. (In other words, at least one 


C, has an interior point.) 
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(This theorem doesn’t have anything to do with category theory, despite the name.) Sometimes in applying this 
theorem, we take C, to not necessarily be closed, and then the result is that one of their closures must contain an 


open ball. In other words, we can't have all of the C, be nowhere dense. 


Remark 35. This theorem is pretty powerful — it can actually be used to prove that there is a continuous function 


which is nowhere differentiable. 


Proof. Suppose for the sake of contradiction that there is some collection of closed subsets C, that are all nowhere 
dense such that U, Cn = M. We'll prove that there’s a point not contained in any of the C,s, using completeness, 
below. 

To do this, we'll construct a sequence inductively. Because M contains at least one open ball, and C, cannot 
contain an open ball, this means that M # Cj, and thus there is some p; € M \ Cy. Because C, is closed, M \ Cy is 
open, and thus there is some €; > 0 such that B(pi,€1) NC, = @. 

Now, B(pi, 3) is not contained in Cz (because the closed set C2 is assumed to not contain any open ball), and 
thus there exists some point p2 € B(pi, 3) such that p2 ¢ C2. Because C2 is closed, we can then find some €2 < 3 
such that B(p2, €2)M Co = ©. 

More generally (we'll be explicit this time but cover this in less detail in the future), suppose we have constructed 


points Po,--- , Px and constants €,,--- ,€,% such that €, < a Kr arts and with the constraint that 


E 


bj € B(pj-1, 3), BCR, EJ NG = @ 


for all j. Then we construct px+41 as follows: because B(px, =) is not contained in Cx41, there exists an element 
Prat € Bp, =) such that Pxi1 ¢ Cx41. Then we can pick some é€x41 < 4 so that B(px4i,€k41) 1 Cry = @ 
(because M \ Cxy41 is open). So by induction we get a sequence of points {px} in M and a sequence of numbers 
Ex € (0, £1), such that the two boxed statements above are satisfied. 

This sequence is Cauchy, basically because we've made our és decrease fast enough: for all k,2 € N, repeated 


iterations of the triangle inequality gives us 


d(Pk, Pete) < d( Px, Peat) + d(Pk41, Pk+2) +++ + d(Pk+e-1, Pk+e)- 


And now by the first boxed statement, we can bound this as 


Ek+e-1 Ei 


Ek+1 
3 


< 
3 


This sum can be bounded by the infinite geometric series 
<eé ye 
1 3m 5 ; 
m=k 


and thus making k large enough bounds this independently of 2. So the sequence of points {px} is Cauchy, and because 
M is complete, there exists some p € M such that px > p. 
And now we can show that p doesn’t lie in any of the Cxs (which is a contradiction) by showing that it lives in all 


of the balls B(p,, €;) — this is because for all k € N, we have 


ee 1 en €k 
d( Pits Proi4e) < Ex41 3 3 eee 7 < Ek41 S 3 "=m. 
n=1 
So taking the limit as 2 — oo, we have 
E E 
d(Pk+1,P) < on es 


2 6’ 
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and thus 


1 1 
d(Px, P) < d(Pks Pkt) + (peti, Pp) < 3ek + ek < Ek: 


So p € B( px, €x) for each k, and each of these balls is disjoint from Cx. So pis not in any Cx, meaning p ¢ U, Cx = M, 


which is a contradiction. 


And we can use this to prove some results in functional analysis now: 


Theorem 36 (Uniform Boundedness Theorem) 


Let B be a Banach space, and let {7,} be a sequence in B(B,V) (of linear operators from B into some normed 


space V). Then if for all b € B we have sup, ||Tnb|| < co (that is, this sequence is pointwise bounded), then 


SUPy || Tn|| < co (the operator norms are bounded). 


Proof. For each k EN, define the subset 
Cy = {be B: ||b|| < 1, sup ||T,b]| < k}. 
n 


This set is closed, because for any sequence {bn} C Cx with b, + b, we have ||b|| = limp—+oo ||bn|| = 1, and for all 
mé€N, we have 
|| Tm || = lim | Tim ball 
n->oo 


(using the fact that these operators are bounded and thus continuous). And now ||Tinbyl|| < k because b, € Cx, so 
the limit point must also be at most k. 


But we have 
{bE B: ||| <1}=UC., 


k<n 
because for any b € B, there is some k such that sup,, ||Tmb|| < k (by assumption). And now the left-hand side is a 
complete metric space, because it is a closed subset of /, and thus by Baire’s theorem, one of the Cxs contains an 
open ball B( bo, do). 

So now for any b € B(0, 69) (meaning that ||b|| < 60), we know that bp + b € B(bo, 60) C Cx, so 


sup ||Tn(bo + 5)|| < k. 
n 


But then 


sup ||7 || = sup || — Tnbo + Tn(bo + 5)I| < sup ||Tnbo|| + sup ||Tn(bo + by] Sk +k, 
n n n n 


because bo, bb) +b are both in B(bo, do). So for any b in the open ball B(0, 69) satisfies sup, || Tnb|| < 2k, and rescaling 
means that for any n € N and for all b € B with ||b|| = 1, we have 


meaning that the operator norm of T, is at most 4 for all n, and thus sup, ||7p|| < 4%, and we're done. L 
0 a 0 


6 4k 
Tr ( —b) || <2k => ||T,bl| < —, 
2 d0 


4 February 25, 2021 


Last time, we proved the Uniform Boundedness Theorem from the Baire Category Theorem, and we'll continue to 


prove some “theorems with names” in functional analysis today. 
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Theorem 37 (Open Mapping Theorem) 


Let B,, Bs be two Banach spaces, and let T € B(B,, Bz) be a surjective linear operator. Then 7 is an open map, 


meaning that for all open subsets U C By, T(U) is open in Bo. 


Proof. We'll begin by proving a specialized result: we'll show that the image of the open ball B,(0,1) = {b € B,: 
||b|] < 1} contains an open ball in Bo centered at 0. (Then we'll use linearity to shift and scale these balls accordingly. ) 


Because T \s surjective, everything in Bo is mapped onto, meaning that 


Bo = |_) T(B(0, n)) 
neN 
(because any element of B, is at a finite distance from 0, it must be contained in one of the balls). Now we've written 
Bs as a union of closed sets, so by Baire, there exists some no € N such that T(B(0, no)) contains an open ball. But 
T is a linear operator, so this is the same set as 97 (B(0, 1)) (we can check that closure respects scaling and so on). 
So we have an open ball inside T(B(0, 1)) — restated, there exists some point vo € Bz and some radius r > 0 such 
that B(vo, 4r) is contained in T(B(0,1)) (the choice of 4 will make arithmetic easier later). 

And we want a point that’s actually in the image of B(0,1) (not just the closure), so we pick a point vz; = Tu, € 
T(B(0,1)) such that ||vo — w|| < 2r. (The idea here is that points in the closure of T(B(0,1)) are arbitrarily close to 
points actually in T(B(0,1)).) Now B(\4, 2r) is entirely contained in B(vp, 4r), which is contained in T(B(0, 1), and 
now we'll show that this closure contains an open ball centered at 0 (which is pretty close to what we want). For any 
I|v|| <r, we have 

5(2v +My) € 5 BUM. 2r) Cc 57 (B(0, i= 7 (BO, 3)), 


and thus v = —T (#4) + $(2v + w) is an element of —T (4) + T(B(0, 4)) (this is not an equivalence class — it's 


the set of elements T(B(0, $)) all shifted by —T (4)), and now by linearity this means that our element v must be 


in the set T (—4 + B(0,4)). But we chose u; to have norm less than 1, so —“ and any element of B(0, 5) must 


both have norm at most 5 (and their sum has norm at most 1). Thus, this set must be contained in T(B(0, 1)), and 
therefore the ball of radius r, B(O, r) (in Bo) is contained in T(B(0, 1)). 
But by scaling, we find that B(0,2-"r) = 2~"B(0, r) is contained in 2-"7(B(0, 1)) = T(B(0, 2-")) (repeatedly 


using homogeneity), and now we'll use that fact to prove that B(0, 5) is contained in T(B(0,1)) (finally removing 


the closure and proving the specialized result). To do that, take some ||v|| < 5; we know that (plugging in n = 1) 
v € T(B(0,4)). So there exists some b; € B(0,4) in By such that ||v — Tbi|| < 4 (this is the same idea as 
above that points in the closure are arbitrarily close to points in the actual set). Then taking n = 2, we know that 
v—-The T(B(0, 4)), so there is some by € B(0, ;) such that ||v — Tb; — Tbo|| < §. Continue iterating this for 


larger and larger n, so that we have a sequence {bx} of elements in By such that ||by|| < 27* and 


02> Th, 


k=1 


a aan 


And now the series St bx is absolutely summable, and because B, is a Banach space, that means that the series is 
summable, and we have b € By such that b= °°, by. And 


n 
yo 
k=1 


\|b|| = lim 
noo 


n 
= 
< jim, 9 ll 
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by the triangle inequality, and then we can bound this as 


= Do llbell < Sp 2-* = 1. 
k=1 k=1 


Furthermore, because T is a (bounded, thus) continuous operator, 


n n 
ro jin 7 (344) = =v, 


because we chose our by, so that ||v — Tb; — Tbp —--- — Tb,|| converges to 0. Therefore, since b € B(0,1), 
v € T(B(0,1)), and that means the ball B(0, 5) in Bz is indeed contained in T(B(0, 1)). 


We've basically shown now that 0 remains an interior point if it started as one, and now we'll finish with some 


translation arguments: if a set U C By, is open, and by = Tb, is some arbitrary point in T(U), then (by openness of 
U) there exists some € > 0 such that b; + B(0,€) = B(bi,€) is contained in U. Furthermore, by our work above, 
there exists some 6 so that B(0,6) C T(B(0,1)). So this means that 


B(bo, €6) = bo + €B(0, 5) C bo + eT(B(0, 1)) = T(b1) + eT (B(0, 1)) = T (br + B(0, €)). 


But 6; + B(0,€) is contained in U, so indeed we've found a ball around our arbitrary by contained in T(U), and this 


proves the desired result. 


Corollary 38 


If B,, Bz are two Banach spaces, and T € B(B, Bo) is a bijective map, then T~? is in B(Bo, By). 


Proof. We know that T~? is continuous if and only if for all open U C By, the inverse image of U by T~* (which is 


T(U)) is open. And this is true by the Open Mapping Theorem. 


From the Open Mapping Theorem, we get this an almost topological result, which gives sufficient conditions for 


continuity of a linear operator. But first we need to state another result: 


Proposition 39 


If B,, Bo are Banach spaces, then B, x Bo (with operations done entry by entry) with norm 
[|(b1, b2)|] = []b1I] + [62 


is a Banach space. 


(This proof is left as an exercise: we just need to check all of the definitions, and a Cauchy sequence in B, x Bo will 
consist of a Cauchy sequence in each of the individual spaces B, and Bo. So it’s kind of similar to proving completeness 
of R?.) 


Theorem 40 (Closed Graph Theorem) 
Let B,, By be two Banach spaces, and let 7 : B, — Bo be a (not necessarily bounded) linear operator. Then 
T € B(Bi, Bo) if and only if the graph of 7, defined as 


r(T) = {(u, Tu) : ue By}, 


is closed in By, x Bo. 
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This can sometimes be easier or more convenient to check than the boundedness criterion for continuity. And 
normally, proving continuity means that we need to show that for a sequence {u,} converging to u, Tu, converges 
and is also equal to Tu. But the Closed Graph Theorem eliminates one of the steps — proving that the graph is closed 
means that given a sequence u, — u and a sequence Tu, — v, we must show that v = Tu (in other words, we just 


need to show that the convergence point is correct, without explicitly constructing one)! 


Proof. For the forward direction, suppose that T is a bounded linear operator (and thus continuous). Then if (Up, T un) 


is a sequence in [(7) with u, > u and Tuy > v, we need to show that (u, v) is in the graph. But 


v= lim Tu, = T ( lim Un) aT 
n-oo 


noo 


and thus (u, v) is in the graph and we've proven closedness. 


For the other direction, consider the following commutative diagram: 


I(T) 


Po 
Bi z 


> Bo 


Here, 71 and m2 are the projection maps from the graph down to By; and Bo (meaning that m(u, Tu) = u and 
To(u, Tu) = Tu). We want to construct a map S : By + I(T) (so that T = m0 S), and we do so as follows. 
Since [(T) is (by assumption) a closed subspace of By x Bs, which is a Banach space, F(T) must be a Banach 
space as well. And now 71,72 are continuous maps from the Banach space [(T) to Bi, Bo respectively, so m1 Is 
a bounded linear operator in B([(T), B1), and similarly 72 € B(T(T), Bo) (we can see this through the calculation 
\|wo(u, v)I| = Iv} < [ful] 4] lvil = |1(u, v)||, for example). Furthermore, 71 : (7) — B, is actually bijective (because 
there is exactly one point in the graph for each u), so by Corollary 38, it has an inverse S : By — [(T) which is a 


bounded linear operator. 


And now T = 7120S is the composition of two bounded linear operators, so it is also a bounded linear operator. 


Remark 41. The Open Mapping Theorem implies the Closed Graph Theorem, but we can also show the converse (so 


the two are logically equivalent). 


Each of the results so far has been trying to answer a question, and our next result, the Hahn-Banach Theorem, is 
asking whether the dual space of a general nontrivial normed space is trivial. (In other words, we want to know whether 
there are any normed spaces whose space of functionals B(V, IK) only contains the zero function.) For example, we 
mentioned that for any finite p > 1, 2° and £9 are dual is ear = 1, and it’s also true that (co)! = £'_ So Hahn-Banach 


will tell us that the dual space has “a lot of elements,” but first we'll need an intermediate result from set theory: 


Definition 42 

A partial order on a set E is a relation < on E with the following properties: 
- For alle ec E, exe. 
* For alle, f € E, if ex f and f xe, then e=f. 
- For alle, f,geE, ifexfandf xg, thenexg. 


An upper bound of a set D C E is an element e € E such that d ~ e for all d € D, and a maximal element of 


E is an element e such that for any f € E, ex f = > e =f (minimal element is defined similarly). 
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Notably, we do not need to have either e = f or f < e in a partial ordering, and a maximal element does not need 


to sit “on top” of everything else in E, because we can have other elements “to the side:” 


Example 43 


If S is a set, we can define a partial order on the powerset of S, in which E = F if E is a subset of F. Then not 


all sets can be compared (specifically, it doesn't need to be true that either FE < F or F x E). 


Definition 44 
Let (E, x) be a partially ordered set. Then a set C C E is a chain if for all e, f € C, we have either e X f or 
fe: 


(In other words, we can always compare all elements in a chain.) 


Proposition 45 (Zorn’s lemma) 


If every chain in a nonempty partially ordered set E has an upper bound, then E contains a maximal element. 


We'll take this as an axiom of set theory, and we'll give an application of this next lecture. But we can use it to 


prove other things as well, like the Axiom of Choice. 


Definition 46 


Let V be a vector space. A Hamel basis H C V is a linearly independent set such that every element of V is a 


finite linear combination of elements of H. 


We know from linear algebra that we find a basis and calculate its cardinality to find the dimension for finite- 
dimensional vector spaces. (So a Hamel basis for R” can be the standard n basis elements, and a Hamel basis for gi 
can be (1,0,0,---), (0,1,0,---), and so on.) And next time, we'll use Zorn's lemma to talk more about these Hamel 


bases! 


5 March 2, 2021 


We'll prove the Hahn-Banach theorem today, which explains how to extend bounded linear functionals on a subspace to 
the whole normed vector space, answering the question of whether the dual of bounded linear functionals is nontrivial 
for normed vector spaces. 

Last time, we discussed Zorn’s lemma from set theory (which we can take as an axiom), which tells us that a 
partially ordered set has a maximal element if every chain has an upper bound. (Remember that this notion involves a 
generalization < of the usual <.) As a warmup, today we'll use this axiom to prove a fact about vector spaces. Recall 
that a Hamel basis of a vector space V is a linearly independent set H, where every element of V is a finite linear 
combination of elements of H. We know that finite-dimensional vector spaces always have a (regular) basis, and this 


is the analog for infinite-dimensional spaces: 


Theorem 47 


If V is a vector space, then it has a Hamel basis. 
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Proof. We'll construct a partially ordered set as follows: let E be the set of linearly independent subsets of V, and we 
define a partial order ~ by inclusion of those subsets. We now want to apply Zorn’s lemma on E, so first we must 


check the condition: if C is a chain in E (meaning any two elements can be compared), we can define 
C= U e 


to be the union of all subsets in the chain. We claim that c is a linearly independent subset: to see that, consider a 
subset of elements v1, V2,°-* ,Vp € Cc. Pick e1, €2,°-* ,@, € C such that vj € e for each j: by induction, because we 
can compare any two elements in C, we can also order finitely many elements in C as well, and thus there is some J 
such that e; < ey for all j € [1,2,--- , n]. So that means that all of v1,--- , Vp are in ey, which is a linearly independent 
set by assumption. So indeed our arbitrary set v1,--- , Vp € Cc Is linearly independent, meaning c is linearly independent. 

And now notice that e ~ c for all e € C — that is, c is an upper bound of C. So the hypothesis of Zorn is verified, 
and we can apply Zorn's lemma to see that E has some maximal element H. 

We claim that H spans V — suppose otherwise. Then there is some v € V such that v is not a finite linear 
combination of elements in H, meaning that HU {Vv} is linearly independent. But then H ~ HU {v} (meaning ~= but 


not equality), so H is not maximal, which is a contradiction. Thus H must have spanned V, and that means H is a 


Hamel basis of V. 


Now that we've seen Zorn’s lemma in action once, we're ready to use it to prove Hahn-Banach: 


Theorem 48 (Hahn-Banach) 


Let V be a normed vector space, and let M Cc V be a subspace. If u: M — C is a linear map such that 


|u(t)| < C||t]| for all t € M (in other words, we have a bounded linear functional), then there exists a continuous 


extension U : V > C (which is an element of B(V,C) = V’) such that U|,y = u and ||U(t)|| < C]|t]| for all t Ee V 


(with the same C as above). 


This result is very useful — in fact, it can be used to prove that the dual of 2° is not £1, even though the dual of 
£1 is 2%. 


To prove it, we'll first prove an intermediate result: 


Lemma 49 


Let V be a normed space, and let M Cc V be a subspace. Let u : M —> C be linear with |u(t)| < C||t|| for all 


t € M. \If x ¢ M, then there exists a function u’ : M’ + C which is linear on the space M’ = M+ Cx = {t+ ax: 
te M,ae€ C}, with u'|y =u and ju'(t’)| < C|t’| for all t’ € M’. 


We can think of M as a plane and x as a vector outside of that plane: then we're basically letting ourselves extend 
u in one more dimension, and the resulting bounded linear functional has the same bound that u did. The reason this 
is a helpful strategy is that we'll apply Zorn’s lemma to the set of all continuous extensions of u, placing a partial order 
using extension. Then we'll end up with a maximal element, and we want to conclude that this maximal continuous 
extension is defined on V. So this lemma helps us do that last step of contradiction, much like with the proof of 
existence for a Hamel basis. 


Let's first prove the Hahn-Banach theorem assuming the lemma: 


Proof of Theorem 48. Let E be the set of all continuous extensions 


E = {(v,N): N subspace of V, M CN, v is a continuous extension of u to N}, 
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meaning that it is a bounded linear functional on N with the same bound C as the original functional u. This is 


nonempty because it contains (u, MW). We now define a partial order on E as follows: 
(v1, Ni) x< (v2, N>) if Ny oa No, Vol Ny, =Vy 


(in other words, v2 is a continuous extension of v,). We can check for ourselves that this is indeed a partial order, and 
we want to check the hypothesis for Zorn’s lemma. To do this, let C = {(v;, N;) : / € /} be a chain in E indexed by 
the set / (so that for all i, io € /, we have either (v;,, Nj, ) X (v;,, Nj) or vice versa). 

So then if we let NV = Uje; 


is not too hard to show: let x1, x2 € N and a;, a2 € C. Then we can find indices 1, f2 such that x1 € Ni, and x2 € Nj, 


N; be the union of all such subspaces N;, we can check that this is a subspace of V. This 


and one of these subspaces Nj,, Ni, is contained in the other because C is a chain. So (without loss of generality), we 
know that x1, X2 are both in Nj,, and we can use closure in that subspace to show that a1X1 + a2x2 € Ni CN. 

And now that we have the subspace N, we need to make it into an element of E by defining a linear functional 
u: N — Cwhich satisfies the desired conditions. But the way we do this is not super surprising: we'll define v: N—> C 
by saying that for any t € N, we know that t € N; for some /, and then we define v(t) = v;(t). But this is indeed 
well-defined: if t € Ni, 7 Nj, it is true that vj,(t) = v;,(t), because we're still in a chain and thus one of (v;,, Nj,) and 
(v;,, Ni.) is an extension of the other by definition. Similar arguments (exercise to write out the details) also show that 
v is linear, and that it’s an extension of any vj (including the bound with the constant C). So (vj, Ni) x (v, N), and 
we have an upper bound for our chain. 

This means we've verified the Zorn’s lemma condition, and now we can say that E has a maximal element (U, NV). 
We want to show that N = V (which would give us the desired conclusion); suppose not. Then there is some x € V 
that is not in N, and then Lemma 49 tells us that there is a continuous extension v of U to N + Cx, which must 


then also be a continuous extension of u. So (v, N+ Cx) is an element of E, but that means (U, N) ~ (v, N+ Cx), 


contradicting (U, N) being a maximal element. So N = V and we're done. 


We'll now return to the (more computational) proof of the lemma: 


Proof of Lemma 49. We can check on our own that M’ = M+ Cx is a subspace (this is not hard to do), but 
additionally, we can show that the representation of an arbitrary t/ € M’ as t + ax (for t € M and a€ C) is unique. 
This is because 

ttax=f+4x = (a-4)x=t-teM, 


which means that x € M (contradiction) unless a = 4, which then implies that t = f. We need this fact because we 


want to define our continuous extension in a well-defined way: if we choose an arbitrary A € C, then the map 
u(t + ax) = u(t) + ad 


is indeed well-defined on M’, and then the map u’ : M’ > C is linear. If the bounding constant C is zero, then our 
map is just zero and we can extend that map by just using the zero function on M’. Otherwise, we can divide by C 
and thus assume (without loss of generality) that C = 1. It remains to choose our X so that for all t € M and ae€C, 
we have |u(t) + aA| < ||t + ax||, which would show the desired bound and give us the continuous extension. 

To do this, note that the inequality already holds whenever a = O (because it holds on M), so we just need to 


choose A to make the inequality work for a 4 0. Dividing both sides by |a] yields (for all a 4 0) 
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We know that 4 € M because t € M, so this bound Is equivalent to showing that 
ju(t) —A| <|t-—x| VteM. 
To do this, we'll choose the real and imaginary parts of A. First, we show there is some a@ € R such that 
|w(t) — a| < ||t— x|| 


for all t € M, where w(t) = EEE) is the real part of u(t). Notice that |w(t)| = |Re u(t)| < |u(t)| < ||t|| by 


assumption, and because w is real-valued, 
w(ti) — w(t2) = w(ti — te) < |w(tr — te)| < || — tal 


(the middle step here is where we use that w is real-valued). Connecting this back to the expression ||t — x||, we can 


add and subtract x from above and use the triangle inequality to get 
w(t1) — w(t2) < ||; — xI| + [lt — x1]. 


Thus, for all t1, to € M, we have 


w(t1) — ||t1 — x|| < w(t2) + ||t2 — xl|, 
and thus we can take the supremum of the left-hand side over all tys to get 
sup w(t) — ||t — x|| < w(te) + ||te — x|| 
teM 
for all to € M, and thus 
sup w(t) — ||t — x|| < inf w(t) + ||t— xl]. 
teM teM 


So now we choose a to be a real number between the left-hand side and right-hand side, and we claim this value 
works. For all t € M, we have 
w(t) —||t—x|| <a<w(t)+||t— xl], 


and now rearranging yields 
—||t-x|| <a—w(t) < |jt—x|| => |w(t)— al < |lt— xl], 


and we've shown the desired bound. So now we just need to do something similar for the imaginary part, and we do 
so by repeating this argument with ix instead of x. This then defines our function u’ on all of M+ Cx, and we're 


done (we can check that because the desired bound holds on both the real and imaginary “axes” of x, it holds for all 


complex multiples of x). 


6 March 4, 2021 


We'll finish our discussion of the Hahn-Banach theorem today — recall that this theorem tells us that a bounded linear 
functional on a subspace of a normed space satisfying |u(t)| < C||t|| (on the subspace) can be extended to a bounded 
linear functional on the whole space with the same bound. The proof is important to see, but what’s more important 
is how we can use it as a tool. We mentioned that we can show that the dual of £° is not 2', and here’s something 


else we can do: 
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Theorem 50 
Let V be a normed space. Then for all v € V \ {0}, there exists an f € V’ (a bounded linear functional) with 


[|F]| = 1 and F(v) = |IvI|. 


Proof. First, define the linear map u : Cv > C (here, Cv denotes the span of v) by defining u(Av) = Al|v|| (this is 
well-defined because every element in the span of v can be uniquely represented this way, and it’s also clearly linear 
because only » is varying). Then it is indeed true that |u(t)| < ||t|| for all t € Cv, and also u(v) = ||v||. Therefore, 
by Hahn-Banach, there exists an element of the dual space f extending u, such that ||f(t)|| < ||t|| for all t. So we've 


found a linear functional so that f(v) = u(v) = |{v||, and also with operator norm 1 (we know it is exactly 1 because 


we have equality when applying f to Tq): and we're done. 


Definition 51 


The double dual of a normed space V, denoted V”, is the dual of V’. 


In other words, V” is the set of bounded linear functionals on the set of bounded linear functionals on V. 


Example 52 
Fix an element v € V, and define the element T, : V’ > C by setting 


Ty(V’) = v'(v) 


for all linear functionals v’ € V’. Then Ty is an element of the double dual. 


To check this, we should make sure 7, is linear in the argument v’, and this is true because we're applying functionals 
to a fixed v: 
Ty(Yy + V2) = (vz + ¥3)(v) = Yy(v) + va(v). 


We should also check that Ty is bounded: indeed, 
IVY) = [VL < Hv vl 


(because v’ is some bounded linear functional with norm ||v’||). And since ||v|| is a constant, we've found that the 


norm of Ty is at most ||v||, and thus 7, is indeed in the double dual of V. 


Definition 53 
Let V and W be normed spaces. A bounded linear operator T € B(V,W) is isometric if for all v € V, ||Tv|| = |]v|]. 


Theorem 54 
Let v € V, and define the element T, : V’ > C of the double dual via T)(v’) = v'(v). Then T : V > V” sending 


v to Ty is isometric. 


Proof. We've already done a lot of the work here: we showed already that v ++ T, is a bounded linear operator (noting 
that T,(v’) ts linear in both v and in v’). So the map T sending v ++ T, is in B(V,V”), and we just need to show that 


it is isometric. 
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Since } ||Tv|| < ||v]| | from our work above, we know that ||T|| < 1, and it suffices to show equality for all v. It’s 


clear that ||To|| = ||0||, and now if v € V \ {0} is a nonzero vector, then there exists some f € V’ such that ||f|| = 1 
and f(v) = ||v|| (by Theorem 50). So now 


IVI] = F(v) = |F(v)] = |TVCF)L SUIT VIL HFT, 


and thus | ||v|| < ||7y|| |. Putting this together with the reverse inequality above yields the result — ||7y|| = ||v||, and 


thus T is isometric. 


Notice that isometric bounded operators are one-to-one, because the only thing that can be sent to the zero vector 
is the zero vector if lengths are preserved. It’s natural to ask whether operators are also onto (surjective), and there 


iS a special categorization for that: 


Definition 55 


A Banach space V is reflexive if V = V”, in the sense that the map v +> 7, is onto. 


Example 56 
For all 1 < p < oo, we know that 2? is reflexive (since the dual of 2° is 27, whose dual is 2? again). But £! is not 


reflexive, because the dual of its dual £° is not £4. And the space co of sequences converging to 0 is also not 


reflexive — we can identify (co)’ with £1, whose dual is 2°. 


With this, we've concluded our general discussion about Banach spaces, and we are now moving to Lebesgue 
measure and integration. We've been talking about 2° spaces so far on sequences, and it makes sense to try to 
define L? spaces on functions in a similar way. But using Riemann integration isn’t quite good enough — Lebesgue 
integration has better convergence theorems, in the sense that they're more widely useful. And for a concrete example, 


consider the space of Riemann integrable functions on [0, 1] 
L»([0, 1]) = {f : [0,1] > C: f Riemann integrable on [0, 1]}. 


(We integrate a complex-valued function by integrating the real and imaginary parts separately here.) We may try to 


define a norm via F 
Fils -{ lF(x)|dx 


(it’s not quite a norm because we can have a function which is nonzero at only a single point, but let’s pretend it’s a 
norm), and the problem we'll run into is that we don't have a Banach space! So more general integration will help us 
get completeness, which is important for applications like differential equations. 

To get the Lebesgue L? spaces, we can take the completion of the L}, space that we defined above, much like 
the real numbers can be defined as the completion of the rational numbers. But we can do things from the ground 
up instead, and we'll indeed see along the way that the Riemann integrable functions are dense in the set of Lebesgue 


integrable functions. 


Fact 57 


Our goal is to make a new definition of integration that is more general than Riemann integration: it will still be 


a method of calculating area under a curve, but we'll build it up in a way that allows for more powerful formalism. 
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And the way we'll define this is to start with functions 1¢ that are 1 on some set E and 0 otherwise, which we call 
indicator functions. We'll get to definitions and theorems in a second, but we know what we want those functions to 
integrate to in some special cases: if E = [a, b], then the integral f 1e(x)dx = f 1f2,4(x)dx should be the area under 
the curve, which is b — a. So the way we'll define integrals over more complicated functions E to look a lot like the 
“length” of E, and that’s more generally going to be called the measure m(E) of E. 

In other words, the first step of getting an integral defined is to get a measure defined on subsets of R, and this is 
what will be called the Lebesgue measure. From our discussion above, there are a few properties of this Lebesgue 


measure that we already know we want to have: 


1. We want to be able to measure any subset of the real numbers (because Riemann integration can’t deal with 


functions like 1g). In other words, we want to define the function mon P(R), the powerset of R. 


2. As a sanity check, if / is an interval, m(/) should be the length of / (and the measure shouldn’t care about 


whether we have open or closed intervals). 


3. The measure of a whole set should be the sum of the measures of its chunks: more formally, if {E,} is a 


countable collection of disjoint sets and E = U,Ep, then we want m(E) = >>, m(En). 
4. Translation invariance should hold: if E is a subset of IR, and x € R is some constant, then m(x + E) = m(E). 


But unfortunately, even these four properties are impossible to satisfy at the same time — it turns out that there 
is no function m : P(R) — [0, co] that satisfies these conditions! (We can search up the Vitali construction for 
more details.) So what we'll do is to drop the first assumption — we'll try to define a function m on only some of the 
subsets of R, while still satisfying properties (2), (3), (4), and we'll show that the set of such Lebesgue measurable 
sets is indeed pretty large. 

The strategy for doing this comes from Caratheodory: we'll first define a function m* : P(IR) — [0, 00) called 
the outer measure, which satisfies conditions (2), (4), and “almost (3),” and then we'll restrict m* to appropriately 


well-behaved subsets of IR to get our actual construction. 


Definition 58 


For any interval / C R, let £(/) denote its length (regardless of whether it is open, closed, or half-closed). For any 


subset A C R, we define the outer measure m*(A) via 
m*(A) = inf > Ln) : {In} countable collection of open intervals with A C U | : 
n n 


(Through this definition, we can see that m*(A) > 0 for all A.) 


Basically, we can cover any subset of the real numbers with a union of open intervals, and we take the minimum 
possible length over all coverings. (The idea is that as we make the intervals smaller, we can get more information 


about the subset A, and the infimum gives us the best possible information about “how much length” is in A.) 


Example 59 


Consider the set A = {0} containing just a single point. 


Then m*({0}) = 0, because we can cover {0} with the interval (—5, 5) for any € > 0, and this interval has measure 
€. So 0 < m*({0}) <e for all €, and taking € > 0 gives us m*({0}) =0. A similar argument showing that any finite 


set of points has measure zero, and in fact the measure of a countable set is always zero: 
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Theorem 60 


If AC R is countable, then m*(A) = 0. 


For example, even though there are a lot of rational numbers and they are dense in R, we're saying that they don't 


actually fill up a lot of space — the measure of Q is zero. 


Proof. (We can check the case where A is finite ourselves.) If A is countably infinite, then there is a bijection from 
A to N, so we can enumerate the elements as {a1, a2, a3,---} = {a, : n © N}. Pick some € > 0 — we'll show that 
m*(A) <e. 

For each n EN, let /, be the interval (an — ant An + xix). Because /, is an interval containing ap, the set A 
must be contained in the (countable) union of intervals /,, and then the outer measure is an infimum over all possible 


unions, so 
E 


m(A) < San) = =e 


Finally, taking € + 0 yields the result. 


We can now talk about what it means for the outer measure to “almost satisfy (3)” in the set of properties above, 


and the argument is pretty similar to what we did just now. But first, we establish a quick fact: 


Lemma 61 


If AC B, then m*(A) < m*(B). 


Proof. Any covering of B is a covering of A, so the infimum (of interval length sums) over all coverings of A should 


be at most the infimum over all coverings of B. 


Theorem 62 


Let {A,} be a countable collection of subsets of IR, not necessarily disjoint. Then 


nt (Us) < S¢ m* (An). 


(This is basically “half” of the additivity condition that we wanted.) 


Proof. First of all, if there is some n such that m*(A,) = oo (meaning we can’t cover the set by a collection of 
intervals whose sum of lengths is finite), or if }>,m*(An) = oo, then the inequality is true (because the right-hand 
side is already 00). So we can just consider the case where all of the outer measures of A, are finite, and the sum of 
those outer measures also converges. 

The strategy here is going to come up frequently: instead of proving an inequality of the form X < Y (for two 
quantities X and Y), we can equivalently prove that X < Y +e for any e > 0. We'll do that here: fix some € > 0, and 
now define the collection {Inx}xen Of intervals to be a covering of A, with total length S72, £Unk) < m*(An) + & 
(we can't always achieve the infimum given by the outer measure, but we can always achieve a slightly larger number). 
Now because A, is covered by {/nx}x, the union of the A,;s must be contained in the union Up,cen!nk (which is indeed 


a countable union of intervals as well). Thus, 
m* (U A) SSF eink) = > dS lnk) 
n n,k n k 


27 


and now we can sum over k to find that this is 
* é * 
< S- (m (An) + =) = Som (An) + €. 
n n 


Taking € — 0 gives the desired result. 


In particular, we should notice the similarities in this proof with the one in Theorem 60 — the previous proof we did 
was basically a special case where each A, was a single point. 

In our homework, we'll be able to check that the outer measure is indeed translation-invariant (so it satisfies (4)), 
and it seems like the next step is to show that m*(/) = £(/) for an interval / (so (2) is also satisfied). This may be 
intuitive, but it'll take a bit of work to show! So that'll be the first thing we do next lecture, and it'll complete our 


construction of the outer measure and allow us to define the Lebesgue measure. 


7 March 11, 2021 


Last time, we introduced the outer measure m*, which has many of the properties that we want in an actual measure. 
We'll now use this outer measure to define a measure on a class of well-behaved subsets or R (which will then allow 
the measure to satisfy translation invariance and countable additivity). 


We proved that we have countable subadditivity for the outer measure last lecture 


m (U | < Sa (E,). 


It turns out equality doesn’t hold until we restrict to measurable subsets, so (as we mentioned previously) we don’t 


exactly get the condition we want for a measure. But we can verify one of the other conditions: 


Proposition 63 


If / is an interval of R, then m*(/) = é(/). 


In other words, we can't cover an interval of length £(/) with a collection of intervals of smaller total length. 


Proof. First, suppose that / is a closed and bounded interval [a, b]. It suffices to show two inequalities. First, we can 
easily check that m*(/) < £(/) (because / is contained in (a—e€, b+) for any € > 0, meaning that m*(/) < £(/) + 2e, 
and then we can take € — 0), and in particular this means that the outer measure is finite. Next, let’s show that 
£(1) < m*(/): in other words, the sum of the lengths of a bunch of open intervals covering [a, b] have total length at 
least b— a. Suppose that {/,} is a collection of open intervals, such that [a, b] C U,, /n. A closed and bounded interval 
is a compact set, and this compact set is covered by a bunch of open intervals. Thus, the Heine-Borel theorem tells 
us that a finite collection of the {/,,} is sufficient to cover [a, b], and we will label this finite collection {4,,--- , Jy}. 

So now we know that [a, b] C (yee J, and the idea now Is to rearrange the indexing of the open intervals. We 
know that one of the intervals must include the leftmost point a, so we'll call that J. Then (if we haven't covered 
the whole interval yet) there is some interval that overlaps with J;, which we call J>. Continuing in this way, we will 
eventually cover the rightmost point b of the interval, so that Jj and Jj41 are always linked. 

More rigorously, we know that there exists some k, € {1,--- , N} with a © ,, so we rearrange the finitely many 
intervals so that k,; = 1, and suppose that this interval is (a1, b,). If [a, b] is not completely covered, then b; < b, 
and there must be some integer ko such that b; € J,, (because it is not covered by the first interval J,). We then 


rearrange the remaining intervals so that ko = 2, and this new interval looks like (a2, b2). And now bo is either larger 
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than b, or we can repeat the process again to find (a3, b3): eventually we must pass b because the intervals do actually 
cover [a, b]. 
So now we know that there exists some K € {1,--- , N} such that for all k € {1,--- ,K —1} 


be Sb, Akai < be < Desa, 


and we also have b < be (meaning the Kth interval gets us to the rightmost endpoint). But now 


N K 
Son) = So eS) = So kk) = (bk = aK) + (be-1 = aK,) Fe + (br = 1), 
n k=1 k=1 
and now we can bound this by regrouping the finite sum as 


= be + (be-1 — ax) + (bk-2 — ax—-1) +++ + (br — a2) — a > be — a, > bD-— a= AI), 


completing the proof of this special case (the sum of lengths of intervals is at least (/) for any collection, meaning 
the infimum m* is at least £(/) as well). 

The cases for other types of intervals now follow easily. If / is any finite interval [a, b), (a, b], or (a, b), note that 
lat+e,b—e] C! C [a—e, b+ 6] (making intervals a little fatter or thinner covers or gets us completely inside /), and 
thus 

m*([at+e,b—e]) < m(!) < mM ([a—e,b+e]) => (b- a)—2e < m*(/) < (b— a) + 2¢, 


and taking € — 0 gives us the desired result. Finally, an infinite interval (—oo, a), (a, 00), (—co, a], [a, 00), or (—00, 00) 


cannot be covered by a collection of intervals of finite length (this is an exercise for us to work out). 


The next result basically tells us that the outer measure of sets can be approximated by the outer measure of open 


sets: 


Theorem 64 


For every subset A C R and € > 0, there exists an open set O such that A C O and m*(A) < m*(O) < m*(A)+.e. 


Proof. The result is clear if m*(A) is infinite (take O to be the whole number line). Otherwise, m*(A) is finite, and 
let {/,} be a collection of open intervals that cover A and have total length at most m*(A) + ¢€. Then O, this union 
of open intervals, is a union of open sets (so it is open), and it is clear that A Cc O and (by the countable subadditivity 


we proved last time) 


m*(O) = m* (U ) < So m*(In) < So en) < m*(A) +. 


So (indeed) with respect to outer measure, every set can be approximated by a suitable open set. And now we're 


ready to talk about what “suitably nice” subsets of R look like: 


Definition 65 
A set E C R is Lebesgue measurable if for all A C R, 


m*(A) = m*(AN E) + m*(AN E°). 
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In other words, E is well-behaved if it always cuts A into reasonable parts. Notice that we always know that 
m (A) < m(ANE)+ m*(An ES) 
by subadditivity and using that A Cc (AN E) U(AN E*‘), so measurability of a set E is really telling us that 


m*(AN E) + m*(AN E*) < m*(A). 


Lemma 66 


The empty set @ and the set of real numbers IR are measurable, and a set E is measurable if and only if E* is 


measurable. 


Proof. All of these are readily verifiable from the definition of measurability, which is symmetric in E and E°. 


Proposition 67 


If a set E has zero outer measure, meaning m*(E) = 0, then E is measurable. 


Proof. Because AN E Cc E, we know that m*(AN E) < m*(E) = 0, which means m*(ANM E) = 0. So now 


m*(AN E) + m*(AN E*) = m*(AN E*) < m*(A) 


(because AM E* C A), and this is a sufficient condition for measurability. 


This shows us that a lot of “uninteresting” sets are measurable, and we don’t have many interesting examples of 
measurable sets. But it turns out that every open set is measurable, which means that (taking complements) every 
closed set is also measurable. There are in fact many more sets that are measurable — most things we can write down 
are — because taking unions and intersections of basic sets will always give us measurable sets. But before we explain 


that, we need to establish a few properties: 


Proposition 68 


If E, and Es are measurable sets, then FE, U E> is measurable. 


Proof. We need to verify the Lebesgue measurable condition. Let A be an arbitrary subset of R: since E> is measurable, 


we know that 
m (AN Ey) =m (AN Ey 7 Eo) +m (AN EZ Es), 


and now Ef M ES = (E, U E2)* by de Morgan’s law. On the other hand, we know that 
AN (Ei U Eo) = (AN E14) U (AN Eo) = (AN E1) U (AN E2N Ef) 


(because things that are in both A and EF, are already included in the first term). Putting these expressions together, 


m*(AN (Ex U E2)) |< m* (AN Ei.) + (AN E21 Ef), 


and now because F, is measurable, we can rewrite this as 


= m*(A) — m*(AN ES) + m*(AN Eon ES) =| m*(A) — m*(An (E1 U E)) |, 


and rearranging the boxed expressions gives us the desired measurability inequality. 
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Using induction on the result above gives us a slightly more general fact: 


Corollary 69 


If sets E,,--- ,E, are measurable, then Wes Ex is measurable. 


(The base case is clear, and we induct on the number of sets included in the union by adding one more set at a 
time: eae: Ex = (Uger Ex) U Ens1.) And with this result, now we're ready to discuss the structure of the set of 


measurable sets more explicitly: 


Definition 70 

A nonempty collection of sets A C P(R) is an algebra (not the same as the “algebra” in algebra) if for all E € A, 
we have E° € A, and for all E,,--- ,E, € A, we have Ces. E, € A. Furthermore, an algebra A is a o-algebra if 
we have the additional condition that for any countable collection {E,}°2, of sets in A, the union LU, En is also 


in the algebra. 


In words, algebras are closed under complements and finite unions, while sigma-algebras also need to be closed 
under countable unions. And in fact, de Morgan's laws tell us that if —&y,---,&, € A, then their intersection 
Op Beas E¢)° is also in the algebra. So closure holds under both finite unions and finite intersections, and in 
particular that means that @ = EM E* must be a measurable set (because an algebra is always nonempty), and thus 
R = @° is also always measurable. (And similarly, countable intersections of sets are also in o-algebras A.) 

The point of these general definitions is that we'll soon show (in the next lecture) that M, the set of all measurable 
sets, is a g-algebra. (And if we go into measure theory, we'll see more examples of sigma-algebras when we construct 


measure spaces. ) 


Example 71 
The simplest sigma-algebra is given by A = {@,R}, and the next simplest is A = P(R). For a slightly more 


involved example, consider 


A={E CR: E or E* is countable}. 


This last example A is a o-algebra because it’s indeed closed under complements, and if we have a collection 
{En}n C A with all sets E, countable, then the union U, E, is a countable union of countable sets, which is countable 


(and thus the union is in A). And on the other hand, if there is some No such that Ex, is countable (instead of En), 


then 
c 
(U | =( Bek 
n n 
is an intersection of sets, one of which is countable, so this union itself has a countable complement (and is thus also in 


A). So we've verified the necessary conditions, and what we have here is often called the cocountable sigma-algebra. 
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Proposition 72 
Consider the set 


& = {A: A sigma-algebra containing all open subsets of R}. 


(For example, P(R) is one of the elements of ©.) Then the intersection of all such sigma-algebras 


Baal 


Aex 


is the smallest o-algebra containing all open subsets of R, and it’s called the Borel o-algebra. 


(The last condition here basically says that if A is any o-algebra in ©, then B is a subset of A.) 


Proof. The difficulty of this proof really comes in unpacking the definitions, and the main part of the proof is showing 
that B is actually a o-algebra. (This is because every open subset is contained in every A € &, so it must be an 
element of 6, and because It is the intersection of all of the o-algebras in X, it must be the smallest one — it’s a subset 
of any fixed o-algebra in ©.) 

Verifying that B is a o-algebra will mostly be left to us, but we'll show one part. Suppose that E € B is some 
subset of IR: because E € A for all A € ©, we must have E* € A for all A € > (because each element of > is a 
o-algebra, meaning it is closed under complements). So E* is in every element of 2, meaning that it must also be 
in B. So we've shown that the Borel o-algebra is closed under complements. (The proof of closure under countable 


unions is similar: those sets in the countable union must be in every A € 2, and then we can apply closure under 


countable union within each A.) 


We'll show next time that the set of Lebesgue measurable sets is a o-algebra, and in fact this set of measurable 
sets contains the Borel sigma-algebra B. (Remember that this Borel sigma-algebra is pretty big, because we can take 


countable unions and intersections of open sets and end up with a very rich collection of subsets of R.) 


8 March 16, 2021 


Last time, we discussed a few general kinds of collections of subsets of R: recall that an algebra is closed under finite 
unions and complements, and a o-algebra is also closed under countable unions. And the context for this discussion 


is that we defined the set of (Lebesgue) measurable sets to be the E C R such that 
m(A)=mM(ANE)+m (AN ES) VACR. 


In other words, E divides sets nicely with respect to outer measure. We then defined the set of all measurable sets M, 
and we showed last time that these do form an algebra. Today, we'll show that M is actually also a o-algebra, and 
we'll also show that the Borel sigma-algebra 6, which is the smallest o-algebra containing all open sets, is a subset 
of M. (Then we'll be able to define the Lebesgue measure: the measure of any measurable set E is just m*(E).) 


We'll first prove a preliminary result that will make working with countable unions a bit easier: 


Lemma 73 


Let A be an algebra, and let {E,} be a countable collection of elements of A. Then there exists a disjoint 


countable collection {F,} of elements of A, such that U, En = U,, Fn- 
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In other words, if we want to verify that our collection is closed under taking countable unions (which is a condition 


for being a o-algebra), we can just check that it is closed under countable disjoint unions. 


Proof. Let Gyr = Uz_, Ex, so that we have Gi C Go C G3 C---, and U,, En = U,, Gn (we can check this for ourselves 


by checking that every element in the left set is also in the right set, and vice versa). Now define F, = G, and 
F414 _ Gn4t \ G, VYn>1. 


Then we find that Uy_, Fe = Upe1 Ge (again, we can do the symbol-pushing if we want to check), so Up24 Fr = 
U1 Gx, and this is exactly UP, Ex as desired. 


So returning to measurable sets, we'll now show that the collection of Lebesgue measurable sets is a o-algebra: 


Proposition 74 
Let ACR, and let £;,--- , E, be disjoint measurable sets. Then 


mm (a0 lu «| = So m'(An Ex). 
k=1 k=1 


For example, if we had two sets E and E‘, the above equality is the definition of E being measurable. 


Proof. We prove this by induction. The base case n = 1 Is clear because both sides are identical. For the inductive step, 
suppose that we know the equality is true for n = m. Suppose we have pairwise disjoint measurable sets Fy,--- , Em4i, 
and we have some ACR. Since ExN Em41 = @ for all 1 <i < m, we find that 


m+1 
An [LU c| A Emy = AN Emit 
k=1 


(the only intersection comes from Em+i in the big union), and 


m+i1 m 
AN [U 64 hee an 0 a 
k=1 


k=1 


(we pick up everything else except Em +11). Now since E41 is measurable, we know that 


m+1 m+1 m+1 
m* (a0 [LU «) =i (40 [U c| "Em + m* (40 [LU 6 "Ea 
k=1 k=1 k=1 


and plugging in the expressions above yields 


= m*(AN Emy1) tm (40 0 «|. 


Pa 
Il 
i 


and the induction hypothesis yields 


m 
= m*(AN Em+1) + ¥> m*(AN Ex), 
k=1 


and combining these two terms gives us exactly what we want. 


Theorem 75 


The collection M of measurable sets is a o-algebra. 
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Proof. We already know that M is an algebra, and Lemma 73 tells us that it remains to show closure under countable 
disjoint unions (in other words, the countable disjoint union of a set of measurable sets is measurable). Let {E,} be 
such a countable collection of disjoint measurable sets with union E = UP, En: it suffices to show that m*(AN E*) + 
m*(AN E) < m*(A) (since the reverse inequality is always true). 

To show this, take some NV EN. Since M is an algebra, the finite union eae E, C M ts measurable, and thus 


m*(A) = mt (40 Ue om (40 Ue] ). 


c 
Because (eS E, is contained in E, its complement es E,| contains E°, which means that we can write the 


inequality 
N 
> m* (40 U E, 
n=1 


Now we can rewrite the first term here (by Proposition 74) to get 


Jeo anes, 


N 
m*(A) > So m*(AN E,) + m*(AN E*). 


n=1 
Letting N — oo, we find that 
m*(A) > So m*(ANE,) + m*(AN ES), 


n=1 


and now by countable subadditivity we have that this is 


> m* (Ua & +m*(AN E°) = m*(ANE) +m (AN E*), 


completing the proof. 


Remark 76. Remember that the reason for all of this o-algebra business is that this kind of structure is imposed on 
us by our expectations of what a measure should do. Specifically, we wanted the measure of a countable disjoint union 
of sets is the sum of the measures of the individual sets, and for that to be true we need to be able to define the 


measure on an arbitrary countable disjoint union! 


Thus, the collection of measurable sets does form a o-algebra, and we can now show that it contains B if we can 


show that it contains all open sets. We'll start from a simpler case: 


Proposition 77 


For all a € R, the interval (a, co) is measurable. 


Proof. Suppose we have some subset A C R. Define the two sets Ay = AM (a, 00) and A> = AN (—co, a]; we want 
to show that m*(A,) + m*(Az) < m*(A). 
If m*(A) is infinite, this automatically holds, so suppose that m*(A) < co. We'll equivalently show that m*(A;) + 


m*(Ao) < m*(A) + ¢ for an arbitrary € > 0 as follows: let {/,} be a collection of intervals such that 
S72 k(n) < m*(A) + 
n 
(again, we can do this because m*(A) is the infimum over all collections of intervals). If we now define 
Jn = In (4,00), Kp =lnN(—o~, a], 
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then for each n, Jy and K, are each either an interval or empty (because they are intersections of two intervals). Also, 
Ai © U,Jn and Az C U, Kn, and we can check that &(/,) = £(Jn) + (Kn) for each n (because we're just working 


with intervals here). Thus, 


m*(A1) + m*(Ag) |< So m* (Jn) + m*(Kn) 


(because {J,} covers A; and {K,} covers Aj), and we can simplify this as 


= Den) + (Kn) = D7 ln) S| (A) +} 


and then sending € + 0 completes the proof. 


From here, it’s actually not too difficult to show that every open set is Lebesgue measurable: 


Theorem 78 


Every open set is measurable, so the Borel o-algebra B is contained in the set of measurable sets M. 


Proof. Because (a, co) is measurable for all a, so is 
Oo, 7 Ow, = =i Pe i 


because the intervals in the last expression are measurable by Proposition 77, meaning their complements are also 


measurable, and then a countable union is also measurable because M is a o-algebra. And thus any finite open interval 
(a, b) = (—00, b) N (a, co) 


is also measurable because o-algebras are closed under intersections (since they're closed under unions and comple- 


ments, and we can use De Morgan’s law). Finally, every open subset of R is a countable union of open intervals 


(this is on our homework), which completes the proof because we've shown all open intervals are measurable. 


Definition 79 


The Lebesgue measure of a measurable set E C M is 


mE) — am: (&) 


Finally, this means that we've restricted our outer measure to a set of nicely-behaved sets! And we can now 


immediately get a few useful results about the Lebesgue measure: 


Proposition 80 


If A,Be M and ACB, then m(A) < m(B). Also, any interval / is measurable, and m(/) = @(/). 


Proof. These properties are almost all inherited directly from the outer measure, since m(A) = m*(A) for measurable 
A. The only detail is to check that all intervals (open, closed, or half-closed) are measurable, and we can prove this 
with arguments like 

[a, b] = (b, co) MN (—00, a), [a, b) = (—00, b) MN (—co, a)°, 


and using that the set of measurable sets is a o-algebra. 
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And this result is good, because one of our demands for the Lebesgue measure was that we can measure intervals 
(and get the expected result back)! We can now check one of the other conditions that we wanted to hold, countable 
additivity: 


Theorem 81 


Suppose that {F£,,} is a countable collection of disjoint measurable sets. Then 


m (U 3 = Se 


n 


Remember that outer measure satisfied a similar inequality, but we're claiming that Lebesgue measure gives us 


equality now that we've specialized to “nicer” sets. 


Proof. We know that the set U,, En is measurable, so we already get one side of the inequality 
m (U &| =n (U é| < S- m(E,) = » mEn) 
n n n n 


by using the inequality for outer measure. To show the reverse inequality, we will show that 5°, m(E,) <m (U, E,). 


(Us) = mt (Ros), 


and now using Proposition 74 simplifies this to 


For any N EN, we can rewrite 


N N N 


n= 


So for any finite disjoint set, the sum of the measures is the measure of the union (which we've basically proved 
already). But now 

N N co 

SE.) =m (U | <m (U | ; 

n=1 n=1 n=1 


and now we have a uniform bound over all NV, so we can take N -— oo to find that 


5) m(En) <m (U é : 


n=1 


as desired. 


The final condition we still need to check is that the Lebesgue measure satisfies translation-invariance: in other 
words, if F € M and x € R, then m(E + x) = m(E) (where we define the set E+ x = {y+x:y © E}). (And this 
is a problem on our problem set.) But the point is that we've now indeed defined a measure on a very rich class of 


subsets of R with the properties that we want! 


Theorem 82 (Continuity of measure) 


Suppose {F,} is a countable collection of measurable sets such that Fy C E> C---. Then 


(Woe) = fin m( U6) = sm mE 
k=1 


k=1 
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Proof. The equality between the second and third quantities here is because E, = Les E, by nesting. So it suffices 
to show the equality between the first and third quantities, and we'll do this by first writing the countable union as 
a countable disjoint union. Like before, let Fy = Ey and Fray = Exyi \ Ex for all k > 1: then each of the Fyxs is 
measurable because Fxi1 = Ex41M Ef by nesting, and {F,} is a disjoint collection of measurable sets. Then for all 


n €N, we can check (just like above) that 


n co co 
LJnetw LAH Ue 
k=1 k=1 k=1 


Therefore, 


by countable additivity, and now this sum can be written as 


n n 
= fi, Yt = im (UF) = in 


k=1 


and we've shown the desired equality. 


We'll use the Lebesgue measure to define Lebesgue measurable functions next time, which are the analog of 
continuous functions for Riemann integration. Specifically, if we have a function f : X — Y, then we have continuity if 


the preimage of an open set in Y is an open set in X. And we'll see how to make the analogous definition next time! 


Q9 March 18, 2021 


We concluded our discussion of measurable sets last lecture — remember that the motivation is to build towards a 
method of integration that surpasses that of the Riemann integral, so that the set of integrable functions actually 
forms a Banach space. To do that, we wanted to first integrate the simplest kinds of functions, which are 1 on some 
set and 0 on others, and that’s why we cared about defining measure on certain subsets of R (namely the sigma-algebra 
of Lebesgue measurable sets). We won't go through the construction of a non-measurable set — instead, we'll move 


ahead to Lebesgue integration now. 


Fact 83 (Informal) 


If we have an increasing, continuous function f(x) on [a, b], Riemann integrates this function by breaking up 


the domain into intervals of small width and calculating the area of the rectangles. But Lebesgue’s theory of 
integration started (historically) by thinking about chopping up the range, looking at the piece of f between two 
values yj and yj41, finding the corresponding x; and x;4; where the function intersects at those y-values, and 


forming a rectangle with small vertical width instead of small horizontal width. 


It would then make sense to define the integral 


b n 
= lim Do yine(f ia. vi): 


partition 
gets small /=1 


In the above description, our function is increasing, so the x-values where f is between y;_1 and y; are just a single 
interval. But in general, the function f can cross a given y-value multiple times, and instead we will just have some 


subset of [a, b] that lies between the desired range. 
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And this is where measure comes in handy: we know how to measure the “length” of a Lebesgue measurable set, 
so that is the condition we'll put on objects like the preimage of [y;-1, yj]. 

We won't actually define the Lebesgue integral as we do above, because it’s not clear that the result is independent 
of how our sequence of partitions gets smaller. But it is a way that we can integrate a Lebesgue measurable function, 


and it does tell us why we care about the inverse image of closed intervals being measurable. 


Fact 84 


Throughout this discussion, we'll be considering the extended real numbers [—oo, co] = RU {—co, co}, and we'll 


allow functions to take on the values too. 


Remember (from 18.100) that a sequence of real numbers {a,}, converges to oo if for all R > 0, there exists an 


N EN such that a, > R for all n> N. The rules that we'll have for working with these extended real numbers is that 


X + co = +00 for all x € R, 0(+00) = 0 (this equality is just about the algebraic objects, not limiting procedures — 


we'll see why soon), and x(+o00) = +oo for all x > 0 and -foo for x < 0. 


As mentioned just now, measurable functions should be those where inverse images of closed functions are mea- 


surable sets, and that’s almost where we'll start our definition: 


Definition 85 
Let E C R be measurable, and let f : FE — [—o0, 00] be a function. Then f is Lebesgue measurable if for all 


a éR, f~'((a, co]) € M (in other words, the preimage is a measurable set). 


We're considering the half-open intervals in this definition, but this isn’t a particularly picky choice: 


Theorem 86 


Let E C R be a measurable set, and let f : E — [—oo, o0]. Then the following are equivalent: 


. For alla ER, f-*((a, o]) € M, 


. For alla ER, f-1([a, ]) € M, 


. For alla ER, f~1([-o0,a)) € M, 


. For alla € R, f~1([-o0, a]) € M. 


Proof. First of all, (1) implies (2), because 


and inverse images respect operations on sets, so 


Face) =(YF ((a- 5.2] 


and the right-hand side is a countable intersection of measurable sets by assumption and is thus measurable. And (2) 


implies (1), because for all a € R, 


(a, co] = [J fat Zo => f-!(a, 00] Ou (|a+ 5.0} ] 


n 
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is a countable union of Lebesgue measurable sets and is thus Lebesgue measurable. Therefore, (1) and (2) are 


equivalent. A similar argument shows that (3) and (4) are equivalent as well. Finally, 
[-00, a) = ([a, oo])°, 


and taking preimages of these and using that complements of measurable sets are measurable yields that (2) and (3) 


are equivalent, which gives the desired result. 


Theorem 87 


If E is measurable, and f : E > Ris a measurable function, then for all F € B (the Borel sigma-algebra), f~+(F) 


is measurable. 


Proof. \f f is measurable, then for all intervals (a, b), we have 
f-*((a, b)) = f-*([-00, b) N (a, co]) = F-*([-00, b)) N F~*((a, co])), 


and both sets on the right-hand side are measurable and thus so is their intersection. Thus each open interval is 
measurable, and similar to how we concluded that open sets are measurable, we can use the fact that every open set 
can be written as a countable union of open intervals to show that f~1(U) is measurable for all open U C R. Thus, 


A={F CR: f~1+(F) measurable} is a sigma-algebra that contains all open sets, and thus B must be a subset of A, 


as desired. 


Thus, measurable functions make the preimage of Borel sets measurable, and we can also throw too into the mix: 


Theorem 88 


If f : E + R is measurable, then f~+({oo}) and f~1({—oo}) are measurable as well. 


Proof. We can write 


F~*({oo}) = [] F7*((n, c0])), 


and because each set in the countable intersection on the right is measurable, so is the countable intersection. 


Similarly, f~+({—oo}) = (V2, f-1([-o0, —n)), and by using Theorem 86, we again see that the set we care about is 


the countable intersection of a bunch of Lebesgue measurable sets and is thus measurable. 


This tells us that the inverse image of any Borel set, possibly tossing in too, is always measurable for measurable 


functions. 


Example 89 


If f : R > R Is continuous, then it is measurable. (This is a good sanity check, because continuous functions are 


Riemann integrable). 


To show this, notice that 
f~*((a, c0]) = F-"((a, 00)) 


is the preimage of an open set and Is thus open and thus measurable. 


39 


Example 90 


lf E, F C R are two measurable sets, then the indicator function x- :E > R 


EXC er 
xXZ@F 


Xe (x) = 


is measurable. 


This one can be checked by direct computation: 


@ a> il, 
Fi((a,o0]J)=4 ENF OK<aK<l, 
iz a<0O, 


and all of these sets are measurable. 


Theorem 91 


Let E C R be measurable, and suppose f, g : E — R are two measurable functions and c € R. Then cf, f +9, fg 


are all measurable functions. 


This is useful to have because we will end up with L? spaces for integrable functions, which are often added together 


and multiplied. 


Proof. We basically want to check the definition of measurability. For scalar multiplication, if c = 0, then cf =Oisa 


continuous function, so it is measurable by Example 89. Otherwise, if a € R, then 
a 
cf(x) >a <=> f(x)> ai 


so the inverse image (cf)~1((a@, co]) = f~1((£, oo]) is measurable for any a (because f is measurable). And this is 
exactly the condition for cf to measurable. 


Next, we'll consider the sum of two measurable functions. If @ € R, then we'll check preimages via 
f(x) + g(x) >a == f(x) >a-—g(x) = f(x) >r>a-gQ(x) 


for some rational number r, since there is a rational number between any two distinct real numbers. And that means 


that there exists some r € Q such that 
x € F-1((r, co]) Ngo ((a — Fr, oo), 


and both expressions in the intersection are measurable by assumption, so the intersection is also measurable. Thus 


the preimage of (f + g)~*((a, co]) is 
(f + g)*((a, co) = LJ (F-*((r,0]) Ng *((a— r00])) , 
rEQ 


which is measurable (because we're taking countable intersections and unions, using that the rationals are countable), 
so f +g Is measurable. 


Finally, for the product fg, we'll pull a trick: we'll first show that f? is measurable. Because f? is a nonnegative 
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function, for any a < 0, 
(f7)*((@, co]) = E 


(the entire domain maps within (a, oo]), which is measurable by assumption. The other case is where a > 0, in which 
case f* > a if and only if f(x) > Va@ or f(x) < —Va. So 


(f7)“*((a, oof) = F7*((Var, c0]) U F-*([-00, Vex), 


and both of the sets on the right here are measurable (again using Theorem 86), so the union is measurable and thus 


f? is measurable. We finish by noticing that 


fo=al(f boy =F = 9) ): 


and f + g and f — g are measurable because f, g are measurable, and we can scale by —1 or add functions together. 


Every operation we take here preserves measurability, and thus we've shown that the product of two measurable 


functions is measurable, as desired. 


(Notice that the functions above only go from E — R, and that’s because we wanted to avoid oo — oo showing 
up in some of the functions.) All of those properties we showed above also work for Riemann integration, so this isn’t 


really anything special yet — what makes Lebesgue integration stand out is that we have closure under taking limits. 


Theorem 92 


Let E C R be measurable, and let f, : E — [—o0, 00] be a sequence of measurable functions. Then the 


functions gi(x) = supp fa(X), 92(X) = inf fal), da(X) = lim SUPy yoo fal) = inf, (Suppo fe(X)]. and ga(x) = 
lim inf ps0 fn(X) = SUP» [INF Kon fe (X)] are all measurable functions. 


Proof. To check that the pointwise supremum is measurable, notice that 
x € 9, *((a,co]) <=> sup f,(x) >a, 
n 
which occurs if and only if there is some n where f,(x) > a: 


=> xe Ff, ((a,co]) => xe Jf (a, oo). 


Since each set in the countable union is measurable, so is the union, and thus the preimage of ((a@, oo]) under gj is 
indeed measurable (meaning gi is measurable). And very similarly (this time we'll include @ in the set), we can check 
that 

x € gy \([a,00]) => x €[} fy “(Ie 00]), 


n 
and each f,-+([a, co]) is measurable, so the intersection is also measurable (meaning go is measurable). 
Finally, g3 is the infimum of a sequence of functions defined as supremums of the fs, and gq Is the supremum of 


a sequence of functions defined as infimums of the f,s. Since we've shown closure under infs and sups, that means 


we get the result for g3 and g4 immediately. 


Corollary 93 


Let E C R be measurable, and let f, : E + [—00, oo] be measurable for all n. If limp; soo fn(x) = f(x), then f is 


measurable. 
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Proof. |f we have pointwise convergence of the functions, then f(x) = lim SUPp455 fn(X) = lim infpsoo fy(X) is mea- 


surable by Theorem 92. 


And in fact, this corollary is false for Riemann integration (the pointwise limit of Riemann integrable functions is 


not always Riemann integrable), so we're starting to see difference between the Riemann and Lebesgue approaches. 


Example 94 


The set Q/N [0, 1] is countable, so we can enumerate its elements as {1, fo, f3,---}. Then the functions 


1 xé€{n,--+. tm} 
f(x) = : 
0 otherwise 
are each Riemann integrable (because they are piecewise continuous), but their pointwise limit is the indicator 


function Xgqjo,1j, which is not Riemann integrable. 


As an important note, being Lebesgue integrable and measurable are two different things (measurable functions 
are candidates for being integrable), and in fact the pointwise limit of Lebesgue integrable functions will not always 
be Lebesgue intergrable, but they will be under an additional mild condition. So we're on track to develop a stronger 


theory of integration here! 


Definition 95 


Let E be a measurable set. A statement P(x) holds almost everywhere (a.e.) on E if 


m({x € E : P(x) does not hold}) = 0. 


(It may seem like we're asking for the set to both be measurable and have measure zero, but remember that any 
set with outer measure zero is of measure zero. So replacing m with m* above will give us the same statement.) And 


the idea here is that sets of measure zero don't affect measurability: 


Theorem 96 


If two functions f, g : E —> [—co, ox] satisfy f = g a.e. on E, and f is measurable, then g is measurable. 


In other words, changing a measurable function on a set of measure zero keeps it measurable. 


Proof. Let N = {x € E: f(x) # g(x)}: by assumption, this set has outer measure zero, so m(N) = 0. Then for 
aeR, 
Na = {x EN: g(x) >a} CN 


also has measure zero (because m*(Ng) < m*(N) = 0). Therefore, 
g*((a, oof) = (F-"((a, o0]) 9 N°) U Na 


(because the preimages are the same outside of N, and then we also have to account for the set where g(x) > a 
and doesn’t agree with f). But N is measurable, so N° is measurable, and thus the intersection f~1((a, co]) MN N& 


is measurable. Finally, Nq is also measurable (it has measure zero), so the final expression on the right is indeed 


measurable, proving that g is measurable as desired. 
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We'll extend this idea of measurability to (finite) complex numbers next time, and then we'll show that a particular 
class of functions are the universal measurable functions. That will allow us to define the Lebesgue integral for certain 
nonnegative functions, and from there, we'll be able to move towards proving that the set of Lebesgue integrable 


functions forms a Banach space. 


10 March 23, 2021 


Last time, we introduced the idea of a measurable function: recall that a function f : E + [—co, co] (where E CR 
is measurable) is measurable if f~+((a@, oo]) is measurable for all a € IR. (And because we can generate open sets with 
these half-open intervals, that shows that the preimage of any Borel set will be measurable as well.) We also showed 
that measurable functions are closed under linear combinations, infs, sups, and limits, and that changing a function 
on a set of measure zero preserves measurability. 

Everything we've done so far is for extended real-valued functions, but often we'll be dealing with complex-valued 
functions instead, and we'll extend our definition accordingly. Recall that we can write any complex-valued function f 
as Re(f) + /-Im(f): 


Definition 97 


If E C R is measurable, a complex-valued function f : E > C is measurable if Re(f) and Im(f) (which are both 


functions E — R) are measurable. 


We can verify the following results (some will be assigned to our homework, and others follow from arguments 


similar to the ones made last lecture): 


Theorem 98 
If f,g : E + C are measurable functions, and a € C, then the functions af, f + g, fg, f, |f| are all measurable 


functions. 


Theorem 99 
If f, : E + C is measurable for all n, and we have pointwise convergence f,(x) — f(x) for all x € E, then f is 


measurable. 


For example, we can prove this last fact by noticing that 
lim f,(x) = F(x) <> lim Re(f,(x)) = Re(f(x)) and lim Im(f,(x)) = Im(f(x)), 
n-oo n-oo n-oo 


and we can apply the results we know about real-valued measurable functions to get measurability of Re(f) and Im(f), 
which proves measurability of f. So general, we don't need to work too hard to prove these results! 

Last lecture, we showed that continuous functions are measurable, and so are indicator functions x%¢ for measurable 
sets E. Theorem 98 then tells us that complex linear combinations of indicator functions are also measurable, and 


those are “simple” because they only take on finitely many values: 


Definition 100 


A measurable function @: E > C is simple (or a simple function) if |@(E)| (the size of the range) is finite. 
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The idea is that every measurable set will be approximately a simple function, but we'll talk about that soon. And 


to connect this definition to the “linear combination of indicator functions’ idea, suppose that the range $(E) is the 


set of distinct values {a,,--- , a,}. Then we can define the sets 
Ai = $* ({aui}), 
which are all measurable (because they're the intersections of the sets where Re(@) = Re(a;), and also where 


Im(¢) = Im(a;)). And then for all i 4 j, we know that A;M Aj = @, and Uj_, Ai = E (basically, here we're saying 


that the finitely many Ajs partition the domain based on the value of the function @). So for all x € @, we can write 
n 
(x) = So a5 xa,(x), 
i=1 


and thus any simple function is indeed a complex linear combination of finitely many indicator functions. 


Proposition 101 


Scalar multiples, linear combinations, and products of simple functions are again simple functions. 


(We can check in all cases that the resulting functions are still measurable, and also that their range includes finitely 


many values.) 


Theorem 102 
If f : E — [0, co] is a nonnegative measurable function, then there exists a sequence of simple functions {¢,} 


such that the following properties hold: 


(a) We have a pointwise increasing sequence of functions dominated by f: 0 < do(x) < di(x) < +++ < F(x) for 
allx € E. 


(b) Pointwise convergence holds: liMp+oo @n(x) = f(x) for all x € E. 


(c) For all B > 0, 6, + f converges uniformly on the set {x € E : f(x) < B} where f is bounded. 


(This proof will basically carry over to the extended real-valued functions, and also the complex-valued functions. 


But we'll explain soon what the difference is.) 


Proof. The idea will be to build our functions @, to have better and better resolution (2~”) and larger and larger range 
(2”). Essentially, @o will only be able to tell whether the function is at least 1 (we'll only let it take on the values 0 
and 1, being 1 if f > 1 and O otherwise), $1 will be able to tell the values of functions up to 2 (resolving at intervals 
of $, so that it can take on the values 0, S, 1, 3, 2), and so on. And we claim that this sequence of approximations 
satisfies the three conditions we want above. 
Formally, we define the sets 
Ek ={xeE:k2-"< F(x) < (k+ 12-4 


for all integersn>OandO<k< g2n _ 4. (This is the “interval of length 2~”” described above, and this is another 


way to write the inverse image f—!((k2~", (k + 1)2~"]), which is measurable.) We'll also define 
Fy = f-*((2", ool) 


(another measurable set which grabs the part of the function f that we missed above), and that finally allows us to 


44 


define 
2201 


On = D> (K2-") + XEK + "XK, 
k=0 


(remembering that k2~" is a lower bound for the function on the interval EX). For example, we would have 


1 3 
P1 = O° Xraqoay + 5° Xray +1 Xraqaay + 5° Xeag.gy + 2° Xr-1(.co))- 


It is indeed true that @, takes on finitely many values for each n, so @, Is always a simple function, and by design, 
0 < ¢, < f. (For example, if f(x) = 1.7 at some point x, then we fall within the (3, 2] range, and then ¢, takes on 


the lower bound of that range 3.) More rigorously, if x € ee then 
k2-9 < F(x) < (kK 4+1)2-7 => on(x) = k2-” < F(X), 


and otherwise x € Fp, which means f(x) > 2” = @,(x). All that remains for proving part (a) is to show that the ns 


are pointwise increasing: notice that if x € Ee then 
k2-" < F(x) <(K41)277 => (2k)2-) << F(x) < (2k + 2)2°Y, 


which implies that x € £2, U E2k41. And we can check in both cases that ¢n+1(x) is larger than @p(x): if x € E2k,, 
then 
n(x) = K2-7 = (2K) = Gnsr(x), 


and otherwise x € E**1?, which means that 


n(x) = kK27" = (2k)2~D << (2k 4.12 = yay (x). 


Finally, if x € Fp, then bn(x) < bn+i(x) by a similar argument. So we've shown that n(x) < bn41(x) on each of the 
sets F, and E* (which partition E), and thus part (a) is proven. 
We can now prove (b) and (c) because of the following: we claim that for all x € {y € E: f(y) < 2”}, 


0 < F(x) — bp(x) < 27”. 


Once we show this claim, we can show part (b) because for any x, either f(x) = oo (this case is easy to verify) or F(x) 
is in the sets {y € E : f(y) < 2”} for n large enough, so then for sufficiently large n we have |f(x) — bp(x)| < 27”, 
which is enough for pointwise convergence. And part (c) follows because for any fixed B, we can pick an N so that 
{x € E : f(x) < B} is contained in {x € E : f(x) < 2%}, and then the bound in the claim also shows uniform 
convergence. 

So in order to show the claim, remember that ¢, cuts up our range into intervals of resolution 2~”: since 


g2n-1 


{ye E: f(x) <2} = LJ ex 
k=0 


we can Just check the claim on each individual Ee And indeed, if x € pe then 


n(x) = k27-" < F(x) < (K+1)277 = > F(x) — bn(x) < (K 4192-7 —k27 = 2, 


as desired, completing the proof. 


Now, as promised, we'll extend this proof to the extended real numbers and the complex numbers: 
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Definition 103 


Let f : E > [—o0, co] be a measurable function. Then we define the positive part and negative part of f via 


fT (x) = max(f(x),0), f(x) = max(—f (x), 0), 


so that f(x) = f*(x) — f(x) and |F(x)| = fT (x) + F7 (x). 


We know that f+ and f~ are indeed measurable (for example because they are the supremum of the sequences 
{f,0,0,---} and {—f,0,0,---}), and they are also nonnegative by definition. 


Theorem 104 
Let E C R be measurable and f : EF — C be measurable. Then there exists a sequence of simple functions {@,} 


such that the following three properties hold: 


(a) We again have pointwise increasing functions, in the sense that 0 < |¢o(x)| < |¢1(x)| < +--+ < |F(x)] for all 
ETE, 


(b) Again, we have pointwise convergence liMp +00 On(x) = f(x) for all x € E. 


(c) For all B > 0, 6, + f converges uniformly on the set {x € E : |f(x)| < B}. 


It's left to us to fill in the details, but the idea is to apply Theorem 102 after splitting up the function f into its real 
and imaginary parts, and then further splitting those up into their positive and negative parts. The linear combinations 
of the simple functions that arise from each of those parts will then give us the desired approximation for f. 

The significance of this result is that we now have a way to define an integral by looking at the limit of these 
types of simple functions, and the Lebesgue integral can be defined this way. But we'd run into issues of whether the 
integral depends on the simple function representation, so we'll do something different here. 

Our first step is to start with Lebesgue integrals of nonnegative functions (to avoid things like oo — oo, and because 


as we've just seen, knowing properties for nonnegative functions will then generalize to all complex-valued functions. ) 


Definition 105 


lf E C R is measurable, we define the class 


L*(E) = {f : E > [0, 00] : f measurable}. 


We'll try to define a Lebesgue integral for these functions, and we'll start with the simple ones: 


Definition 106 
Suppose @ € L*(E) is a simple function such that ¢ = pei ajxa;, where A; Aj = © for all i,j and Uy Aj = /2, 
Then the Lebesgue integral of ¢ is 


[e = Ds ajm(A,) € [0, co]. 


(We may write das Je odx as well.) Basically, we split up our simple function in a canonical way into combina- 
tions of indicator functions on disjoint sets, and then we think of the integral of each of those pieces as the “rectangle” 


with area equal to its length times height. 
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Theorem 107 


Suppose @, w are two simple functions. Then for any c > 0, we have the following identities: 


l. fecb=cJed, 


2s S-(O+) =feot+ fev, 
3. [-b<feWif O<¥, and 
4. if F C E is measurable, then [-¢= [-xrd< J-@. 


Proof. We can prove (1) by noticing that if 6 = 07, ajxa,, then ch = Yo7_, (caj)xa,, SO 


[< = y cajm(Aj) = 22 ajm(Aj) = cf 6. 


(This proof is not too hard because the decomposition over sets A; is the same in both cases.) For (2), we can again 
write @ = )o7_, axXa, and write = 0; bkxXB,, and then we can write 

m m n 

E=|( A= Ue] 4=U4nee, B= JAB, 

j=l k=1 k=1 j=l 

and all of these unions are disjoint because the Ajs and Bxs are disjoint from each other. Therefore, the additivity 


property of Lebesgue measure tells us that 


[0+ [¥=oamtAar+ Yo bums) 


can be rewritten as 


= S- ajm(Apn Bx) + 9 bem(Ai Be) = So (aj + bk) m(Aj NO Bx). 
jk ik Jk 
But the sum of the two simple functions ¢ + wW can be written as 
P+W= > (aj + dk) XAnB, 
ik 

where technically this is no longer our canonical decomposition because it’s possible for the different a; + bys to be 
equal to each other for different sets (j,k), but that’s okay because we can just combine those disjoint sets together 
where the function is equal. So indeed I-@ +y)= (a + by)m(Aj M By), and we've shown the desired equality 
for (2). 

Next, for (3), assume @, W are written in their canonical way. Then (x) < w(x) if and only if a; < by whenever 


Aj Be # S. So now additivity of the Lebesgue measure tells us that 


[e = 3 ajm(Aj) = x aym(Aj N Bx), 
j=l jk 


and now whenever this is nonzero we know that Aj M By is nonempty, so aj < bk. And thus we can rewrite this as 


< ee bem(AjO Br) = S- bym(Bx) = | w. 
jk k=1 B 


(Finally, part (4) will be left as a simple exercise to us.) 
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So we've now defined our “area under the curve” for Lebesgue integrals, and this is an indication that Riemann 
integrable functions will indeed be Lebesgue integrable (because step functions are indeed Riemann integrable and we 
have agreement there). Next time, we'll go from here to defining the integral of a nonnegative measurable function, 


and we'll prove some properties (including two important convergence theorems) along the way. 


11 March 30, 2021 


Last time, we defined the Lebesgue integral for simple functions: for any simple function @ written in the canonical 
way ee ajXa, for disjoint sets Aj, we have de d= ya ajm(Aj), and we proved some properties about this integral 
last time (we have linearity of the integral, if f(x) < g(x) for all x, then If < +o: and so on). Today, we'll 
define the integral for general nonnegative measurable functions, and much like Riemann sums give better and better 
approximations for Riemann integrals as the rectangles become thinner, we can think of Lebesgue integrals as being 
the result of a similar limiting procedure. 

We saw last time already that for a nonnegative measurable function f, we can always find a sequence of simple 
functions that increase pointwise to f. So it makes sense to try to define the Lebesgue integral as the limit of the 
integrals of the simple functions, but then we run into issues where the final integral may depend on the specific 


sequence of simple functions that we chose. 


Definition 108 
Let f € Lt(E). Then the Lebesgue integral of f is 


[t=su{ [0:06 LE) simle.o < Fh. 
E iE 


Proposition 109 
Let E CR be a set with m(E) = 0. Then for all f € L*+(E), we have f, f =0. 


In other words, it’s only interesting to take integrals over functions of positive measure. (And this is sort of like 


how Riemann integrals over a point are always zero.) 


Proof. Working from the definition, start with our function f € L*(E). If ¢ is a simple function in the canonical form 
Yo ajx(Aj) with @ < f, then m(A;) < m(A) = 0, so in the sum 1 ajm(Aj), all terms must be zero. So we 


always have i gd = 0, and the supremum over all simple functions @ Is also zero, as desired. 


We can also verify a bunch of results that were true of the Lebesgue integral for simple functions: 


Proposition 110 


If ¢ € L*(E) is a simple function, then the two definitions of ie f (for simple functions and general nonnegative 


measurable functions) agree with each other. If f,g € L*(E), c € [0,00) is a nonnegative real number, and 
fo hen we haveriacr— elon andj =< Ve gairinallyaitt: Lone band @ Se cthen ise <a) 


(The proof will be left for our homework, but the idea is that taking supremums shouldn't change our inequalities. ) 


We can actually relax the second statement here to an “almost-everywhere” statement as well: 
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Proposition 111 


If f,g € Lt(E), and f < g almost everywhere on E, then {-f < J, g. 


Proof. Define the set F = {x € E : f(x) < g(x)}; this is a measurable set because g — f is measurable, so the inverse 
image of [0, co] is measurable (with some small details about how functions behave at oo, but we're dealing with that 


on our homework). By assumption, m(F‘°) = 0, and thus by Proposition 109 and Proposition 110, 


pr-fr+ fee [es fo- fot fo foo 


In particular, if we know that f = g almost everywhere on E, then Te f= i- g. We may notice that we're missing 


as desired. 


the linearity that we had for simple functions: we haven't mentioned that {-f + f-9 = [-(f +). To prove that, 


we'll need one of the big three results in Lebesgue integration: 


Theorem 112 (Monotone Convergence Theorem) 
If {f,} is a sequence of nonnegative measurable functions (in L*(E)) such that f, < fo <--- pointwise on E, and 


f, — f pointwise on E for some f (which will be in L*(E) because the pointwise limit of measurable functions is 
lim | i i f. 
noo is is 


Notice that the assumption of pointwise convergence here is much weaker than the uniform convergence we usually 


measurable), then 


need to assume for Riemann integration. 


Proof. Since f, < fo <---, we know that he fi< fe fo <---. Thus, te ft iS a nonnegative increasing sequence of 
nonnegative numbers, meaning that the limit lim; +o. 2 f, exists in [O0, oo]. Furthermore, because liMp+oo fr(x) = F(x) 


for all x, we know that f, < f for all n, which means that i f (which is also some number in [0, co]) must satisfy 


fas fe jim fos fr 
E E noo JE E 


It suffices to prove the reverse inequality (that je f< Jo llittig sos te f,), and we can show this by showing that 
Jeo < i limMipsé ie f, for every simple function @ < f (the point being that eventually f, will be larger than @). 
We'll first take some € € (0,1) as “breathing room.” If @ = San ajXa, IS an arbitrary simple function with @ < f, 
then we can define the set 
E,={x €E: f,(x) > (1-e)d(x)}. 


Since (1 — €)d(x) < F(x) for all x (we have strict equality now that € is positive), and limMpoo fn(x) = F(x), every x 


must lie in some E,. Therefore, we have 


5 
ll 
fan 


Furthermore, because fj < f <---, we know that Fy C Fo C--- (the sets E, are increasing by inclusion). So now 


[az [wz fa-o9=0-0f 6=G-o dD amanen 


notice that 
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(because the inequality holds on E,, and the AjM E,. are measurable and disjoint). And now, because E,, increases to 
E, and therefore E; M Aj C Ex Aj C --- increases to Aj, continuity of Lebesgue measure tells us that as n — oo, 
m(AjN En) > m(Aj;). Therefore, we can take limits on both sides and find (because we have a finite sum on the 
right-hand side) that for all e € (0, 1), we have 


lim | f, > lim (1— Sy ana NEn) =(1- SS ancy =(1- e) | dQ. 
E noo = E 


n—-0o . 
J=1 


Taking € — O yields the desired inequality Je¢ < liMpsco i= fn, and combining the two inequalities finishes the 


proof. 


With this result, we now have tools for evaluating Lebesgue integrals that aren't just using the definition directly: 


Corollary 113 
Let f € Lt(E), and let {gn}, be a sequence of simple functions such that 0 < ¢1 < do <-:- < f, with @, > f 


pointwise. Then f- f =limnsoo fe bn- 


In other words, we can take any pointwise increasing sequence of simple functions and compute the limit, instead 
of needing to compute the supremum explicitly. (And this follows because we can just plug in the @,S as f,S into the 


Monotone Convergence Theorem.) 


Corollary 114 


eee) tenga (t Gene | a: 


Proof. Let {bn}n and {Wp}, be two sequences of simple functions, such that 0 < ¢@1 < ¢@2 <--- < f and @, > f 
pointwise, and similarly 0 < wy < Wo <---<g and w, > g pointwise. Then we have 


O< dit <dot+o<-:-<ft+g, 


where @, + Wp, > f +g pointwise, and each ¢; +7; is a simple function (because it’s the sum of two simple functions). 


Then the Monotone Convergence Theorem tells us that 


[tee a= tim, [n+ v0) = im, f dn fdr 


by using linearity for simple functions, and then the Monotone Convergence Theorem again tells us that this is 
Jef + Jeg, as desired. 


In fact, we have something stronger than finite additivity: 


Theorem 115 
Let {f,}n be a sequence in L*(E). Then 


[Xa=D fr 


(The left-hand side is defined here, because we're summing a bunch of nonnegative real numbers pointwise, and 


we're allowing oo as an output of the our functions.) 
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Proof. By induction, Corollary 114 tells us that for each N, we have 


Now because 


and by definition of the infinite sum, we have pointwise convergence pe fy > >oP21 fa as N — oo, the Monotone 


Convergence Theorem tells us that 


fore) N N oe) 
[Xen fe= ind foad fe 


as desired. 


(And again, this kind of result is not going to hold for Riemann integration, if for example we enumerate the 


rationals and let f, be the function which is 1 at the first n rational numbers and O everywhere else.) 


Theorem 116 


Let f E Lt(E). Then f, f = 0 if and only if f = 0 almost everywhere on E. 


Proof. First of all, if f = 0 almost everywhere, then f < 0 almost everywhere, meaning fe < 20 = 0, so the 


integral is indeed zero. For the other direction, define 
1 
Fy = {XE E: f(x) > =}, F={xeE: f(x) > 0}. 


We know that F = UP2, Fn (because whenever f(x) > 0, we have f(x) > + for some large enough n), and we also 


have Fy C Fo C ---,. Now we can compute 


1 1 
0< om(Fy) = f a<f fa] 20, 
n » n E 


which means that +m(F,) =0 = > m(F,) for all n, and thus by continuity of measure 


mF) =m ( ‘ | - im m(Fr) = 0, 


n= 


as desired. 


We can now slightly relax the assumptions of the Monotone Convergence Theorem as well: 


Theorem 117 
If {f,}n is a sequence in Lt(E) such that f(x) < f(x) <--- for almost all x € E and limnsoo fn(x) = F(x), then 


ete Ni ace edn 


Proof. Let F be the set of x € E where the two assumptions above hold. By assumption, m(E\F) = 0, so f—xrf =0 


and f, — Xf, = 0 almost everywhere for all n. The Monotone Convergence Theorem then tells us that 


[t= [tee = [r= im, fo 
E E F PEPOe) EB 
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where the first equality holds because the two functions f, fx are equal almost everywhere, and the third equality 


holds because {f,} satisfy the assumptions of the Monotone Convergence Theorem on F. We can then simplify this 


= im, | t= im, ft 
noo Je noo Je 


because E \ F has measure zero so any integral over the region has measure zero. 


to 


In other words, sets of measure zero don't affect our Lebesgue integral. 
We're now ready for the second big convergence theorem — it’s equivalent to the Monotone Convergence Theorem, 


but it’s often a useful restatement: 


Theorem 118 (Fatou's lemma) 
Let {fi}n be a sequence in L*(E). Then 


[ie int fn(x) < mint f(x). 
E l-00 noo Je 


(Recall that we define the liminf of a sequence via 


lim inf an = sup int 2| : 


noo n>1 k>n 
and then the liminf function is defined pointwise.) 
Proof. We know that 
lim inf f,(x) = sup int flv) ; 
noo n>1 [k2n 
and the expression inside the brackets on the right is increasing in n (since we're taking an infimum over a smaller 


set). So the supremum on the right-hand side is actually a limit of a pointwise increasing sequence of functions: 


= lim int flr) 


noo 


So now by the Monotone Convergence Theorem, we have 


[Liming f, = jim, fL (ing fi) ; 


and now for all j > n and for all x € E, we know that infx>n f(x) < f(x), So for all j > n, we have a fixed bound 


[ins fs ff 
gE k2n E 


and thus we can take the infimum over all / on the right-hand side and still have a valid inequality: 


| inf tf < int | f. 
Ee k2n jon E 


So we've successfully “swapped the integral and infimum,” and plugging this into the Monotone Convergence Theorem 


[imine = lim i, G f) < lim int [ i = timing | fe: 
E noo Je \ ken noo [jen Je E 


equality above yields 


as desired. 
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We might be worried about the fact that our functions can take on infinite values, and this next result basically 


says that we don’t need to worry too much: 


Theorem 119 


Let f € Lt(E), and suppose that [,- f < oo. Then the set {x € E : f(x) = oo} is a set of measure zero. 


Proof. Define the set F = {x € E : f(x) = co}. We know that for all n, we have nxre < fxr, So integrating both 


nm(F) < f txe< ff <o. 
E E 


Therefore, for all n, m(F) < + f- f, which goes to 0 as n —> co, So we must have m(F) = 0. 


sides yields 


Our next steps will be to define the set of all Lebesgue integrable functions, prove some more properties of the 
Lebesgue integral, and then starting looking into L? spaces (the motivation for this theory of integration in the first 


place). 


12 April 1, 2021 


Last time, we defined the Lebesgue integral of a nonnegative measurable function, and we're going to extend that 


definition today: 


Definition 120 


Let E C R be measurable. A measurable function f : E — R is Lebesgue integrable over E if de |f| < oo. 


(Recall that we can break up a function f as f* — f~, where ft and f7 are the positive and negative parts of f 


(both are nonnegative functions). Then |f| = ft + f~ (which we've previously showed is measurable), so we define 


fina frre fr. 


Since the left-hand side is infinite if and only if one of the two terms on the right-hand side is infinite, being Lebesgue 


the integral 


integrable is then equivalent to f* and f~ both being Lebesgue integrable. So that makes the next definition valid: 


Definition 121 
The Lebesgue integral of an integrable function f : E > R Is 


fre fr-fr. 


This is meaningful because we're only defining this when both terms on the right-hand side are finite, so we're 


never subtracting things with infinities. 
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Proposition 122 
Suppose f, g : E > R are integrable. 


1. For all cE R, cf is integrable with [-cf =c Jef, 


2. The sum f + is integrable with [-(f +9) = J- f+ J-g, and 


3. If A, B are disjoint measurable sets, then f, 2 f = f,f + fz fF. 


Proof. For (1), scaling by c 4 0 either swaps or doesn’t change the positive and negative parts of f (depending on 
whether c is positive or negative), so this is not too complicated and we can verify the details ourselves (given the 
analogous linearity results for nonnegative measurable functions). 


For (2), notice that |f + g| < |f|+ |g], so by the results for nonnegative measurable functions 


[ireacs finsial= firs fia <om. 
= E E E 


So f +g is indeed integrable, and then 


Pog=(" 4g to) 


(though note that we’re not saying that f* + gt is the positive part of (f +g) here), which means that if we split up 


the left-hand side into positive and negative parts, we get 
Cag 0g Jair" eo eee). 


Then each term here is a nonnegative measurable function, so linearity tells us that 


[erate for ton= ferr+o+ frror- 
ferot- fra = ferso- fran, 


and then definition of the Lebesgue integral on the left side and linearity on the right side gives us 
[utoqfers fo ff [o-frefs 
E E E E E E E 
as desired. 


Finally, (3) follows from (2), the fact that 


Rearranging a little gives 


fxauB = fxat+fxe 


when A and B are two disjoint sets, and the fact that & ‘X- = tose f for general integrable functions f because we 


can break everything up into positive and negative parts here as well. 
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Proposition 123 


Suppose f, g : EF > R are measurable functions. Then we have the following: 


1. If f is integrable, then |f,- f| < f,Ifl. 


2. If g is integrable, and f = g almost everywhere, then f is integrable and fe — fe g. 


3. If f, g are integrable and f < g almost everywhere, then ifs ES te g. 


Proof. Result (1) follows from the fact that 


(lie alee 


(first step by definition, second step by the triangle inequality for numbers), and then we can simplify this further by 


= frt+ey= fie 


For (2), we know that |f| = |g| almost everywhere, so from results from nonnegative measurable functions, we know 


linearity as 


that f-|f] = f-|g| < co. So f satisfies the condition for being integrable, and then |f — g| is nonnegative and zero 
almost everywhere. So (using part (1)) 


[+- [= [r-as fir-ai=o, 


since the integral of a nonnegative measurable function which is zero almost everywhere is 0. This implies that the 


integrals are the same. 


Finally, for (3), we can define a function 


g(x) — F(x) g(x) = F(x) 


otherwise. 


h(x) = 


This is a nonnegative measurable function, and h = g — f almost everywhere, so 


o< fit = [n= [of 
ee 


and this chain of relations gives us the desired result. 


by part (2), and then linearity gives us 


Remark 124. Compact subsets of R are Borel sets, so they are measurable and have finite measure. So simple functions 
that are nonzero only on a compact subset of R will be integrable (because we have a finite sum of coefficients times 
finite measures). For another example, continuous functions f on a closed, bounded interval [a, b] also have continuous 
absolute value, so they attain some finite maximum c. Thus the integral of |f| is indeed finite by monotonicity (it’s 


at most c(b— a)). So a continuous function on a closed and bounded interval is also Lebesgue integrable. 


But we'll prove something stronger than that in just a minute, using this next result (which is one of the most 


useful that we'll encounter in integration theory): 
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Theorem 125 (Dominated Convergence Theorem) 


Let g : E — [0, 00) be a nonnegative integrable function, and let {f,}, be a sequence of real-valued measurable 


functions such that (1) |f,| < g almost everywhere for all n and (2) there exists a function f : E — R so that 


f,(x) — f(x) pointwise almost everywhere on E. Then 
lim | ji — i ie 
n->oo iS 3 


This result is much stronger than anything we can say in Riemann integration — we only require pointwise conver- 


gence and an additional condition that the functions are all bounded above by some fixed integrable function. 


Proof. Because we know that |f,| < g almost everywhere, we know that f, is integrable for each n. Furthermore, 
because f, —> f almost everywhere, f is measurable (because pointwise convergence of measurable functions is 
measurable) and |f| < g almost everywhere, so f is also integrable. 

Also, because changing f and f, on a set of measure zero does not change the value of the Lebesgue integrals, we 


will assume that the assumptions in the theorem statement actually hold everywhere on F (for example, just set the 
functions to all be 0 on that set of measure zero). 


Lal< fisis fo 
E E E 


so the sequence tie fatn iS a bounded sequence of real numbers, meaning that it has a finite liminf and limsup. We 


To start the proof, notice that 


will show that those two values are the same and equal to {2 f. First of all, because g + f, > 0 for all n, 


fe “= [ MCG = tn) imine | (@ =), 


where the first step by definition of pointwise convergence and second step is by Fatou’s lemma. And then by linearity, 


this is 
=f a—iimsup | Fis 
E noo E 


since g has no n-dependence, and flipping the sign of a liminf gives us the limsup. Similarly, we can find that 


[o+ns fat iimint f fa: 
E E n->oo E 


All of the quantities here are finite numbers, and thus we find that (by linearity again) 


imsup | f < [a- [-1)= ff = fis+n- [9s mint ft 
noo JE E E E E E TOO 


But the liminf is always at most the limsup, so these three boxed numbers are equal, as desired. 


We can now use this to prove some other useful results: 


Theorem 126 
Let f € C([a, b]) for some real numbers a < b. (We know that this function is measurable.) Then Stat) fe 


be f(x)dx: in other words, f is integrable and the Riemann and Lebesgue integrals agree. 


Proof. First, because f € C([a, b]) is continuous, so is |f|, and every continuous function on a closed and bounded 
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interval is bounded. Thus there exists some B > 0 so that |f| < B on [a, b], and thus 


is Ae ie eC ao 


So continuous functions are indeed Lebesgue integrable. Now the positive part and negative part of f are continuous, 


because we can write 

_ tif poe ilpess 

a ae.  e1eas 

So by linearity, it suffices to show the result for nonnegative f, since we can split up the Lebesgue and Riemann 


ft 


integrals into positive and negative parts and verify the result in both cases. 


Suppose we have a sequence of partitions 


xt = (a= xf, x8, = B} 
of [a, b], so that the norm of the partition ||x"|| = maxi<j<m, |x/’—xj11| goes to 0 as n — oo. Recall that the Riemann 


integral is defined in terms of Riemann sums based on these partitions, and our goal is to show that the sequence of 
Riemann sums converges to our Lebesgue integral. Now for each /,n, we define er E [xj Xx" to be the point in the 
interval at which the minimum is achieved (this exists by the Extreme Value Theorem): 


inf — f(x) = F (7). 
xe [x7 X71] 


By the theory of Riemann integration, we then know that the lower Riemann sums converge to the Riemann integral: 


Mn b 
Jim DoreEDo? — 1) = ff Fedex 
j=1 2 


But now each x” is a finite set of points, and we can define 
CO 
N= (Je, 


which is a countable union of countable sets and is thus countable. So in particular, we have m(N) = 0, and now we 
can look at the function es 
f= De FER YX per, x + OX fxn}, 
j=l 
which is a nonnegative simple function for each n which basically traces out the lower Riemann sum (since we choose 
the minimum value on each interval of the partition). And as we make the partition finer and finer, the approximate 
areas converge to the full Riemann integral of f, but we can also think about the integrals of each f, as the Lebesgue 


integral of certain simple functions. In particular, the Lebesgue integral 


[= DFG emlbss. SD) = OAD (9 — 9h) 


is exactly the Riemann sum, and now we want to apply the Dominated Convergence Theorem: we just need to show 
that f, > f pointwise almost everywhere and that they are all bounded by an integrable function, because that would 
imply liMp-300 Sta.0) = Sta) f, and we know the left-hand side is the Riemann integral because it’s the limit of the 
Riemann sums. 

To show that the f,s are all bounded by an integrable function, notice that 0 < f,(x) < f(x) for all x € [a, b]\N, and 


we've already shown that f is integrable. For pointwise convergence (everywhere except NV), pick some x € [a, b] \ N, 
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and let € > 0. Because f is a continuous function at x, we know that there exists some 6 > 0 so that for all |x —y| < 6, 
|f(x) — f(y)| < €. And because the partitions get finer and finer (the norms of the partitions go to 0), there is some 
M so that for all n > M, we have maxi<j<n(x? — x1) < 6. So for all n > M, we know that x is part of a partition 


interval of length at most 6, and 


fax) = D> FLEX, og) = FED 


for some unique k such that x € [xj_,, x7] (remembering that by definition, x is not one of the partition points). So 


for all n > M, we have 
lf (x) — fh(x)| = |F(x) — FER) <, 


since |x — €?| < 6 (we have two points within the interval of length 6). Thus we've shown that f,(x) - f(x) for all 
x € [a, b] \ N, which means that we have pointwise convergence. Remembering that f, are all dominated by f, the 


Dominated Convergence Theorem then gives us the desired result: writing out the argument in more detail, 


Mn b 
f = lim / f, = lim F(EP)(x? = x? =| f(x)dx. 
I, n—-oo [a,b] a n->oo y J J J 1 a 


So now everything we've proved for real integrable functions will also carry over to complex-valued integrable 


functions: we define f : E — C to be Lebesgue integrable if ie |f| < oo, in which case we define 


[ra [Rerti [ime 
E E E 


Then results like linearity of the integral and the Lebesgue Dominated Convergence Theorem also generalize. Here’s 


an example of that in action: 


Proposition 127 


If f : E + Cis integrable, then | f- f| < f-|fl. 


Proof. \f de f = 0, this inequality is clear. Otherwise, define the complex number 


(lef) 
Ve 


(the integral of f over E is a complex number, and we want its normalized conjugate). Then |a| = 1, and 


[il-afr= fot 


(first step by definition of the norm for a complex number, second step by linearity), and because the left-hand side is 


a= 


a real number, so Is ieee and thus this is equal to 


=Re f at = [ Rear) < | \Re(af) 


by the triangle inequality for real-valued functions. And now Re(z) < |z| for all complex numbers z, so this can be 


< f lari= fie 
E E 


simplified as 


since |a| = 1. 
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We'll finish our discussion of measure and integration by introducing the L® spaces next time, showing that they're 


Banach spaces and proving a few other properties. 


13. April 6, 2021 


We'll complete our discussion of Lebesgue measure and integration today, finding the “complete space of integrable 
functions” that contains the space of continuous functions. Last time, we defined the class of Lebesgue integrable 
functions and the Lebesgue integral, and we proved the Dominated Convergence Theorem (which we then used to 
show that a continuous function on a closed and bounded interval has the Riemann and Lebesgue integral agree with 
each other). And it can in fact be shown (in a measure theory class) that every Riemann integrable function on 
a closed and bounded interval is Lebesgue integrable and that those two integrals will agree, and this way we can 


completely characterize the functions which are Riemann integrable: they must be continuous almost everywhere. 


Definition 128 


Let f :E + C be a measurable function. For any 1 < p < oo, we define the L? norm 


1/p 
ee (| iP) | 
j= 


Furthermore, we define the L°° norm or essential supremum of f as 


lf l|noce) = inf{M > 0: m({x € E: |F(x)| > M}) = Of. 


(We'll refer to them as norms and prove that they actually are norms later.) This Lebesgue integral is always 
meaningful because |f|° is nonnegative (though it can be infinite or finite), and this definition should look similar to 


the 2? norm for sequences we defined early on in the course. 


Proposition 129 
lf f : E + C is measurable, then |f(x)| < ||fl|L«(e) almost everywhere on E. Also, if E = [a, b] is a closed 


interval and f € C([a, 5]), then ||F||z-<(j2,6)) = ||Flloo is the usual sup norm on bounded continuous functions. 


These facts are left as exercises for us, and they give us more of a sense of why this norm is a lot like the 2° norm. 


And these next statements are facts that we proved for sequence spaces already: 


Theorem 130 (Holder's inequality for L? spaces) 
If 1 << p<oo and ; + 2 = 1, and f,g: E — C are measurable functions, then 


[ita Altec 


We prove this in basically the same way as we did for sequences, and then again from Holder’s inequality we obtain 


Minkowski’s inequality: 


Theorem 131 (Minkowski’s inequality for L? spaces) 


If 1 < p< oo and f,g: E — C are two measurable functions, then ||f + g||zoce) < ||Fllzecey + Il glleeey- 
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A similar result also holds for L°°(E), which we can check ourselves. 


Fact 132 


We'll use the shorthand || - ||p for || - ||_e¢e) from now on. 


Definition 133 


For any 1 < p< oo, we define the L? space 
L?(E) ={f : EC: f measurable and ||f||p < oo}, 


where we consider two elements f,g of L?(E) to be equivalent (in other words, the same) if f = g almost 


everywhere. 


We need this last condition to make the L® norms actually norms, and thus our space is actually a space of 


equivalence classes rather than functions: 
If]={9g:E->C:||gllp < oo and g=f ae.}. 


But we'll still keep referring to elements of this space as functions (as is custom in mathematics). And now our goal 
will be to show that we have a norm (rather than a seminorm) on L?(E), and eventually we'll show that these are 


actually Banach spaces. 


Remark 134. This might seem like a weird thing to do, but recall that the rational numbers are constructed as 
equivalence classes of pairs of integers, and we think of 3 as that quantity rather than the set of (3x, 2x) for nonzero 
integers x. What really matters is the properties of the equivalence class, and for our functions in L?(E), behavior on 


a set of measure zero does not matter. 


Theorem 135 


The space L°(E) with pointwise addition and natural scalar multiplication operations is a vector space, and it is 


a normed vector space under |] - ||p. 


Proof sketch. This is the last time we'll refer to elements of L°(E) as equivalence classes. First of all, notice that 
the L? norm || - ||p is well-defined, because if f = g almost everywhere (which is the condition for them being in the 
same euqivalence class), then |f|? = |g|? almost everywhere, so f-|f|? = f-|g|?, and taking pth roots tells us that 
IIFll> = IIgllo- 

From there, checking that we have a vector space require us to check the axioms, but also that scalar multiplication 
and pointwise addition are actually well-defined: in other words, if we take one representative from [f,] and add it to 
a representative from [fs], we need to make sure that sum is in the same equivalence class regardless of our choices 
from [fi] and [f]. (And then we'd need to check that kind of result for scalar multiplication as well.) We won't do 
these checks of well-definedness in detail, but they aren't too difficult to do. 

Next, we check properties of the L? norm. If f-|f|P = 0, then |f|/? = 0 almost everywhere, meaning that 
f = 0 almost everywhere (and this means that f is in the equivalence class [0]). This proves definiteness, and then 
homogeneity and the triangle inequality follow from the definition and Minkowski's inequality, respectively. (And with 


this, we can now verify all of the axioms of a vector space, including closure under addition, but that’s also left as an 


exercise to us.) 
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Proposition 136 
Let E C R be measurable. Then f € L°(E) if and only if (letting n range over positive integers) 


lim i IFIP < 00. 
N00 JT—nn|nNE 


Proof. We can rewrite our sequence as 


{ / ne} = f xenalf. 
[=n,nJNE n E 


Since we know that {xX -n nlf} iS a pointwise increasing sequence of measurable functions, and for all x € E we 
have 


Jim x_n OIFCOIP = [FCOP. 


Thus, by the Monotone Convergence Theorem, 


fire = lim [ xenalfl? = im | laa 
E n->oo E n—->oo [—n,nJnE 


and thus the two quantities are finite for exactly the same set of fs. 


Corollary 137 


If f : R + C is a measurable function, and there exists some C > 0 and q > 1 so that for almost every x € R, 


we have 
ee et, 


then f € L(R) for all p> 1. 


Proof. Notice that 


n 
/ lf < / CP(1 + |x|)P4 = / CP(1 + |x|) P%dx 
[=n,n] [=n,n] =n 


(because the function (1 + |x|)~°% is continuous and thus the Riemann and Lebesgue integrals agree). And now we 


can check that this integral is at most some finite number C?B(p) for some constant depending on p, independent of 


n. 


Proposition 138 
Let a< band 1 < p< oso that f € L?([a, b]), and take some € > 0. Then there exists some g € C([a, b]) 
such that g(a) = g(b) = 0, so that ||f — gl|p <€. 


In other words, the space of continuous functions C([a, b]) is dense in L?([a, b]), and it’s a proper subset because 


we can find elements in L? that are not continuous. (This will be left as an exercise to us.) 


Theorem 139 (Riesz-Fischer) 


For all 1 < p< oo, L®(E) is a Banach space. 
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Proof. We'll do the case where p is finite (p = oo will be left as an exercise to us). Recall that a normed space is 
Banach if and only if every absolutely summable series is summable, and that’s what we'll use here. Suppose that {f, } 


is a sequence of functions in L°(E) such that 
Y Ilfille = M <0. 
k 


We then want to show that ba f, converges to some function in L°(E), meaning that limpsoo as f, + f in LP, 


which can be equivalently written as 
n 


So(hie- f) 


k=1 


= 0. 


p 


lim 
n—->oo 


To show this, we define the measurable function 


n 


nlx) = So fc X)]. 


k=1 


By the triangle inequality, we know that if we take norms on both sides, we have 


n 
Sofi 
k=1 


\|9nll> = 


n 
< So Ilfillp <M < co. 
p k=l 


So if we now use Fatou’s lemma, we find that 


ioe) Pp 
ij So lhl sf mint aol? <timint f lgnl? < MP 
EV — 1-00 noo Je 


because the L? norm of g, is always at most M. And the function COS.3 lil)? must be finite almost everywhere 
(because its integral is finite), and thus 5°, |f,(x)| is finite almost everywhere. And this allows us to define the function 
f pointwise as 


Sy A(x) if SO, [fh (x)| < co converges 


otherwise, 


f(x) = 


and we'll also define the limit g of the gps, as 


YS lK(x)| if 32, [fi O)| < 00 converges 


otherwise. 


g(x) = 
Then because we've shown pointwise convergence almost everywhere, we have 


im ps f(x) — re) =0) 
k=1 


and furthermore 


< |g? 


Des f(x) — F(x) 
k=1 


almost everywhere on E, because this holds again whenever the infinite sum >°, |f,(x)| converges (the expression 
inside the absolute value on the left is the tail }°7°.,,; f(x), and then we can use the triangle inequality). So now 
because || 3°, [fl llp < M, we also know that ||g||p < M (because those functions agree almost everywhere), and 
thus f-|g|’ < co. 

Now because ||f||p < |lgllp. Jelfl? < felgl? < co, so f can be a candidate for the sum. And we apply the 
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Dominated Convergence Theorem: since we have convergence hae f(x) - FOO" — 0 pointwise almost everywhere, 
and thus quantity is dominated by g, we know that 


n Pp 
jim. [ [oi-F = fo=o. 


k=1 
Therefore, the absolutely summable series {f,} is indeed summable, and we're done — L? is indeed a Banach space. 


So because C([a, b]) is dense in L?([a, b]), and the latter is a Banach space, we can think of the L? space as a 
completion of the continuous functions. 

From here, we'll move on to more general topics in functional analysis, which may be more intuitive because some 
aspects of it are similar to linear algebra. (Of course, some aspects are different from what we're used to, but often 
we can draw some parallels.) Our next topic will be Hilbert spaces, which give us the important notions of an inner 


product, orthogonality, and so on. 


Definition 140 


A pre-Hilbert space H is a vector space over C with a Hermitian inner product, which is a map (-,-): Hx H > C 


satisfying the following properties: 


1. For all Ay, Ao € C and wy, vo, w € H, we have 


(ArVi + AzV2, W) = A1(V1, W) + A2(V2, W), 


2. For all v, w € H, we have (v, w) = (w, v), 


3. For all v € H, we have (v, v) > 0, with equality if and only if v = 0. 


We should think of pre-Hilbert spaces as normed vector spaces where the norm comes from an inner product 
(we'll explain this in just a second). But first, notice that if we have some v € H such that (v, w) = 0 for all w € H, 
then v = 0. So the only vector “orthogonal” to everything is the zero vector. Also, points (1) and (2) above show us 
that 

(v, Aw) = (aw, v) = Aw, v) = Alv, Ww), 


so our inner product is linear in the first variable but does something more complicated in the second variable. 
Definition 141 
Let H be a pre-Hilbert space. Then for any v € H, we define 


IIvll = (v, vy? 


Theorem 142 
Let H be a pre-Hilbert space. For all u, v € H, we have 


|(u, v)] < [lull Iv. 


(This result should look a lot like the Cauchy-Schwarz inequality for finite-dimensional vector spaces.) 
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Proof. Define the function f(t) = ||u+tv||?, which is nonnegative for all t (by definition of the inner product). Notice 
that 
f(t) =(ut+tv,ut+tv) = (u,u) + tv, v) + tu, v) + t(v, u) 


can be written as 
= |lul|? + t?||v||? + 2tRe((u, v)) 


This is a quadratic function of t, and it achieves its minimum when its derivative is zero, which occurs (by calculus) 


when tmin = eee. So plugging this in tells us that 


= |Re((u, v))?| 


O< F (tmin) = ||ul|? i|v||2 


and now rearranging a bit gives us 
[Re((u, v))| < lull {Ivll. 


This is almost what we want, and to get the rest, suppose that (u, v) 4 0 (otherwise the result is already clearly true). 


Then if we define 


so that |A| = 1, we find the chain of equalities of real numbers 


Cu, v)| |= ACu, v) = (Au, v) = Re(Au, v) < |[Aul| [Iv 


and now because (Au, Au) = AA(u, u) = (u, U) (since |A| = 1), this simplifies to 


=|Ilell- Tvl} 


as desired. 


Next time, we'll use this result to prove that the ||v|| function is actually a norm on a pre-Hilbert space, and we'll 
then introduce Hilbert spaces (which are basically complete pre-Hilbert spaces). It'll turn out that there are basically 


only two types of Hilbert spaces — finite-dimensional vector spaces and £2 — and we'll explain what this means soon! 


14. April 8, 2021 


Last time, we introduced the concept of a pre-Hilbert space (a vector space that comes equipped with a Hermitian 
inner product). This inner product is positive definite, linear in the first argument, and becomes complex conjugated 


when we swap the two arguments, and we can use this quantity to define 
lIvIl = (v, vy/? 


for any v in the pre-Hilbert space. And we want to show that this is actually a norm — towards that goal, recall that 


last time, we showed the Cauchy-Schwarz inequality 
I(u, v)| < [lull IIvlh 


for all u, v in the pre-Hilbert space. We'll now put that to use: 
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Theorem 143 


If H is a pre-Hilbert space, then || - || is a norm on H. 


Proof. We need to prove the three properties of the norm. For positive definiteness, note that we do have ||v|| > 0 
for all v, and 
l|vJJ=O0 = W,v) =0 — v=0 


because the Hermitian inner product is (defined to be) positive definite. Furthermore, for any v € H, we have 
(Av, Av) = AX(v, v) => |JAv] = [Al [Ivl| 


by taking square roots of both sides, which shows homogeneity. So we just need to show the triangle inequality: 


indeed, if we have u,v € H, then 


lu t+ v||?}= (ut v,utv) = lull? + |lv||? + 2Re((u, v)). 


Because Re(z) < ||z|] and using the Cauchy-Schwarz inequality, this can be bounded by 


< |lull? + [lvi? + 2l{u, vp < [lull? + [vll? + 2iatl lvl] =| (ell + lvl)? } 


and now taking square roots of both sides yields the desired inequality. 


The Cauchy-Schwarz inequality can also help us in other ways: 


Theorem 144 (Continuity of the inner product) 


If U,; > uand vz, + v ina pre-Hilbert space equipped with the norm ||- ||, then (Up, Vp) > (u,v). 


Proof. Notice that if up) 4 u and vy > v, that means that ||up, — u|| > 0 and ||v, — v|| 4 0 as n > ov. Therefore, 
we can bound 


|(Un, Vn) — (U,V) = |(Uns Vn) — (Uy Vn) + (UL Vn) — (U,V) 


and factoring and using the triangle inequality for C gives us 


= |(Un — U, Vn) + (Uy Vn — V)| S [(Un = U, Vn) | + (Us Yn = V)I- 


The Cauchy-Schwarz inequality then allows us to bound this by 
S |[Un — ul] + ||Vnll + [lull - Iva = vil, 


and now because v, — v we know that ||Vv,j|| — ||v||, and this convergent sequence of real numbers must be bounded. 
Thus, our new bound ts 


S ||Un — ul] - sup | val] + [lull [va — vil, 
n 


and now because ||Up — ul], || Vn — v|| 4 0, the linear combination of them given above also converges to 0, and we're 


done. Thus (Up, Vp) indeed converges to (u, v) (by the squeeze theorem). 


Definition 145 


A Hilbert space is a pre-Hilbert space that is complete with respect to the norm || - || = (-,-)!/2. 
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Example 146 


The space of n-tuples of complex numbers C” with inner product (z,w) = Ya Zjw; iS a (finite-dimensional) 


Hilbert space. 


Example 147 


The space £2 = {a: >, |an|*? < oo} is a Hilbert space, where we define 


(a, b) = Ne ax Dk. 
k=1 


In this latter example, we can check that (a, a)!/? coincides with the £2 norm ||al|2. And it turns out that every 
separable Hilbert space (which are the ones that we'll primarily care about) can be mapped in an isometric way to one 
of these two examples, so the examples above are basically the two main types of Hilbert spaces we'll often be seeing! 


But here's another one that we'll see often: 


Example 148 
Let E C R be measurable. Then L?(E), the space of measurable functions f : E > C with f-|f|? < co, isa 


(fg) = [ Fg. 


We might notice that we focused on £? and L?, and that’s because the inner product only induces the £2 norm in 


Hilbert space with inner product 


the way that it’s written right now. But we might want to ask whether there's an inner product that we could put 
on the other @? or L? so that they are also Hilbert spaces (so that we get out the appropriate norm), and the answer 


turns out to be no. We'll see that through the following result: 


Proposition 149 (Parallelogram law) 


Let H be a pre-Hilbert space. Then for any u, v © H, we have 
l|u + vl? + [lu — vif? = 2 (lull? + [vil7) . 


In addition, if H is a normed vector space satisfying this equality, then H is a pre-Hilbert space. 


We can use this result (which can be verified by computation) to see that there are always u, v which make this 
inequality not satisfied if p 4 2 for the 2° and L° spaces. And now that we have this inner product, we can start doing 


more work in the “linear algebra” flavor: 


Definition 150 


Let H be a pre-Hilbert space. Two elements u,v € H are orthogonal if (u,v) = 0 (also denoted u 1 v), anda 


subset {€,},<q C H is orthonormal if ||e,|| = 1 for all A © A and for all Ay A Ae, (€,,e,) = 0. 


Remark 151. We may notice that the index set we're using is some arbitrary set A instead of N: we'll mainly be 
interested in the case where we have a finite or countably infinite orthonormal set, but the definition makes sense more 


generally. 
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We'll see some examples corresponding to each of the examples of Hilbert spaces C”, 2, L? that we described 


above: 


Example 152 
The set {(0, 1), (1,0)} is an orthonormal set in C?, and {(0,0, 1), (0, 1,0)} is an orthonormal set in C?. 


Example 153 
Let e, be the sequence which is 1 in the nth entry and 0 everywhere else, we find that {e,}n>1 is an orthonormal 


subset of £2. 


Example 154 


The functions f,(x) = —Ke’™ (as elements of L?({—7, m])) form an orthonormal subset of L?([—7, 1])). (This 


1 
V2 
is because the integral ie ELS GInX x — (ES el(m—n)x dx is zero unless m = n; if we're uncomfortable integrating 


a complex exponential, we can break it up into its real and imaginary parts by Euler’s formula.) 


Notice that we haven't talked about whether the spaces £* and L? are separable, but it was an exercise for us 
to show that the continuous functions are dense in L? (for all p < oo) and the Weierstrass approximation tells us 
that continuous functions on a closed and bounded interval can be uniformly approximated by a polynomial. So the 
polynomials are dense in L?, and to get to a countable dense subset, we only consider the polynomials with rational 
coefficients, and there are indeed countably many of those. So for all L® with p finite, the polynomials with coefficients 
of the form qi + ig2 for rational gi: + g2 form a countable dense subset of L?([a, b]), and thus those L? spaces are 
separable. And the set of sequences which terminate after some point form a dense subset in 2? for any finite p as 
well, so we can get our countable dense subset of £? by looking at the set of sequences of rationals that terminate 


eventually! 


Theorem 155 (Bessel) 
Let {e,} be a countable (finite or countably infinite) orthonormal subset of a pre-Hilbert space H. Then for all 


u € H, we have 


So Mu, en)? < [all 
n 


Proof. First, we do the finite case. If fen}r_, is a finite collection of orthonormal vectors in H, we can verify that 


2 
= (Se En) En, Sou, ener , 


N 


SG, En) En 


n=1 n m 


and we can pull out some numbers to write this as 


= Solu, €n) (U, €m) (En, Em), 


By orthonormality, the inner product (€,, @m) is only nonzero when n = m (in which case it’s equal to 1), so that we 


end up with saci €n)*. We can also say by linearity that 


N N 
(« Pe «ee = S- (u, Cn) (U, €n) = S- Cu, en) |?. 


n=1 n=1 n=1 
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From here, note that 


N 2 


u— ye En) Cn 


n=1 


O< 


where the term inside the parentheses can be thought of as the projection of u onto the orthogonal space to the 


eis. We can then rewrite this by expanding in the same way we previously did for ||u + v||*, and we get 


7 N 
—2Re (« Sou, a) : 


n=1 


N 


Sou, €n) en 


n=1 


0 < |lul|? + 


Both of the last two terms now just give us multiples of ae \(u, €n)|* by our work above, and we end up with 


N N N 
0 < lull? + 5 |(u, en)|? — 2° [(u, en)? =| lull? — So Mu, en) | 
n=1 n=1 n=1 


and this is exactly what we want to show for the finite case. And the infinite case follows by taking N — oo: more 


formally, if {e,} is an orthonormal subset of H, then 


N N 
2 2 f 2 2 
Do Kes eo)? Tul? > im, Su en)? Tl 
n= 


n=1 


and this proves the result that we want for all countable orthonormal subsets of H. 


Orthonormal sets are not the most useful thing on their own for studying a pre-Hilbert space H, since we might 


leave out some vectors in our span. That motivates this next definition: 


Definition 156 


An orthonormal subset {e,},ea of a pre-Hilbert space H is maximal if the only vector u € H satisfying (u, e,) = 0 


for all AE Ais u=O0. 


Example 157 
The n standard basis vectors in C” form a maximal orthonormal subset. (A non-example would be any proper 
subset of that set.) 


Example 158 


Our example {e,,} of sequences from above is a maximal orthonormal subset of £?. 


We'll soon see that a countably infinite maximal orthonormal subset basically serves the same purpose as an 
orthonormal basis does in linear algebra, but not every element will be able to be written as a finite linear combination 


of the orthonormal subset elements (like was possible with a Hamel basis). 


Theorem 159 


Every nontrivial pre-Hilbert space has a maximal orthonormal subset. 


We can prove this result by using Zorn’s lemma and thinking of subsets as being ordered by inclusion. But if that 


scares us (because of the use of the Axiom of Choice), we can do a slightly less strong proof by hand: 
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Theorem 160 


Every nontrivial separable pre-Hilbert space H has a countable maximal orthonormal subset. 


(Recall that a space is separable if it has a countable dense subset.) 


Proof. We'll use the Gram-Schmidt process from linear algebra as follows. Because H is separable, we can let {vj}72y 
be a countable dense subset of H such that ||v|| 4 0. 

We claim that for all n € N, there exists a natural number m(m) and an orthonormal subset {€1,--- , @m(n)} So that 
the span of this subset is the span of {v,--- , Vp}, and {€1,--- , €m(n4iy} is the union of {e1,--- , @mny} and either 
the empty set (if Vjii is already in the span) or some vector Cm(n+1) (otherwise). In other words, we can come up 
with a finite orthonormal subset that has the same span as the first n vectors of our countable dense subsets, and we 
can keep constructing this iteratively by adding at most one element. 

We'll prove this claim by induction. For the base case n = 1, we can take e, = Tall which indeed satisfies all 
of the properties we want. Now for the inductive step, suppose that our claim holds for n = k, and now we want to 
span Vv; through vy,+1 instead of just v, through vz. If v~41 is already in the span of {vy,--- , vy}, then the span of 
{€1,°°* , €m(k)} is the same as the span of {v,--- , Ve}, which is the same as the span of {v,--- , Vk41}. So in this 
case, we don’t need to add anything, and all of our conditions are still satisfied. Otherwise, vg41 is not in the span of 


{v1,--+ , Ve}, and we'll define 
m(k) 


Wk+1 — S- (Vitts Gj) G 


j=l 
to be Vvx41 with components along the other vjs subtracted off. This vector is not zero, or else Vx41 would be in the 
span of the existing es and thus in the span of the existing vjs. We then define the normalized version €n441) = Teall 
to add to our orthonormal subset: this is a unit vector by construction, and for any 1 < 2< k we can indeed check 


orthogonality: 


1 m(k) 
(Em(k-+1)» €e) = (a So (Veta. Gi) 6; x) 


[meal a 


now simplifies because the first m(k) e's are already orthonormal: we just pick out / = 2 from the sum and we have 


1 
= —— ((Vp41, Ve) — (V+, €g)) = 0. 
(|mx411| (( k+1 2) ( k+1 2) ) 
Therefore, {€1,--+ , €m(k)» €m(k+1)} IS indeed an orthonormal subset, and that proves the desired claim. 


It now remains to show that the collection of all egs forms a maximal orthonormal subset. We define the set 


S= fen, 5, Oma} 
n=1 


this is an orthonormal subset of H which can be finite or countably infinite, and we want to show that S is maximal. 
And here is where we use the fact that the v;s are dense in H: suppose that we have some u € H so that (u, e) = 0 


for all £2. Then we can find a sequence of elements {Vvjx) }« such that 


lim Vj¢K) — U. 


k- 00 
Because the span of the v;s and the es are the same, we know that each Vi(k) is in the span of {e1,--- Briley fy Xe) 
now 
m(j(k)) 
Ilvjcyll? | = XS I(vjcxy. €e) |? 
f=1 
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(we have equality for a finite set of such orthonormal elements), and now we can rewrite this as 


m(i(k)) 


= YS ly — 4 ee)? <] Ive — Hl? 
f=1 


by Bessel’s inequality. But because vj) — u by construction, this means that ||vj(4)|| + 0, and thus the limit of the 


Vi(k) Ss which is u, must be zero. That proves that our orthonormal basis is indeed maximal. 


Next time, we'll understand more specifically what it means for these maximal orthonormal subsets to serve as 


replacements for bases from linear algebra! 


15 April 13, 2021 


We'll discuss orthonormal bases of a Hilbert space today. Last time, we defined an orthonormal set {e,},eq of 
elements to be maximal if whenever (u,e@,) = O for all A, we have u = 0. We proved that if we have a separable 
Hilbert space, then it has a countable maximal orthonormal subset (and we showed this using the Gram-Schmidt 


process and Bessel’s inequality). Such subsets are important in our study here: 


Definition 161 


Let H be a Hilbert space. An orthonormal basis of H is a countable maximal orthonormal subset {e,} of H. 


Many of the examples we've encountered so far, like C”,é, and L?, are indeed countable and thus have an 
orthonormal basis. And the reason that we call such sets bases, like in linear algebra, is that we can draw an analogy 


between the two definitions: 


Theorem 162 


Let {e,} be an orthonormal basis in a Hilbert space H. Then for all u € H, we have convergence of the Fourier- 


Bessel series 


m 


lim Sa, En) En 
m—->oo 


n=1 


So just like in finite-dimensional linear algebra, we can write any element as a linear combination of the basis 


elements, but we may need an infinite number of elements to do so here. 


Proof. First, we will show that the sequence of partial sums {yon t En) en} is a Cauchy sequence. Since we know 
that S>,_, |(u, en)|? converges by Bessel’s inequality (it’s bounded by ||u||?), the partial sums must be a Cauchy 


sequence of nonnegative real numbers. Thus for any € > 0, there exists some M such that for all VN > M, 


co 
S > |(u, €n)|? < €?. 
m=N+1 
Thus, for any m > £> M, we can compute 
m £ 2 m 
Do (4, enden — DO(u, en)en]f = 3 [us en)? 
n=1 n=1 n=e+1 
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by expanding out the square ||v||* = (v, v) and using orthonormality, and now this is bounded by 


co 
< So |(u,en)? < e?. 
n=l+1 
So for any €, the squared norm of the difference between partial sums goes to 0 as we go far enough into the sequence, 
which proves that we do have a Cauchy sequence in our Hilbert space. Since H is complete, there then exists some 
u' € H so that 


We want to show that u’ = u, and we will do this by showing that (u’ — u, e,) = 0 for all n. By continuity of the inner 


product, we know that for all 2 € N, we have 


(u—u',e) = lim (» — su En) Cn: «) , 


n=1 


and this simplifies by linearity to 
m 
= lim (ue) — (us €n)(€ns ), 


n=1 


but by orthonormality the last term only exists for n = 2, so this simplifies to 


(u, €2) — (u, eg: 1) =0, 


which proves the result because (u — u’, eg) = 0 for all 2 if and only if u— u’ = 0 by maximality. 


So if we have an orthonormal basis, every element can be expanded in this series in terms of the orthonormal basis 


elements. And thus every separable Hilbert space H has an orthonormal basis, and the converse Is also true: 


Corollary 163 


If a Hilbert space H has an orthonormal basis, then H is separable. 


Proof. Suppose that {e,}, is an orthonormal basis for H. Define the set 


s= 1) {So aier aus om Oi} 


meN \n=1 


This is a countable subset of H, because the elements in each component indexed by m are in bijection with Q2”, 

which is countable, and then we take a countable union over m. So now by Theorem 162, S is dense in H, because 

every element u can be expanded in the Fourier-Bessel series above, so the partial sums converge to u, and thus for 
. 


any € > 0, we can take a sufficiently long partial sum of length L and get within 5 of u, and then approximate each 


coefficient with a rational number that is sufficiently close, and that eventual finite-length partial sum will indeed be 


in one of the parts of the S we defined. So S is indeed a countable dense subset of H, and we’re done. 


We can now strengthen Bessel’s inequality, which held for any orthonormal subset, with our new definition: 


#1 


Theorem 164 (Parseval’s identity) 
Let H be a Hilbert space, and let {e,} be a countable orthonormal basis of H. Then for all u € H, 


S24, en)? = [Lull?. 


(In Bessel’s inequality, we only had an inequality < in the expression above!) 


Proof. We know that 


u= Sou, €n) en, 


n 
so if the sum over nis a finite sum, the result follows immediately by expanding out the inner product ||u||? = (u, u). 


Otherwise, by continuity of the inner product, we can write 


lull? = tim. (Su En) Ens > (Us ae) 
l= 


n=1 1 


and we can move the constants out (with a complex conjugate for one of them) and rearrange sums to get 


m 


= tim > (u, en)(u, &) (ene). 


né=1 


Again, orthonormality only picks up the term where n = &, so we're left with 


m 


= lim Sou, En) (U, En) = i> Cu, en) I, 


n=1 


and this last expression is the left-hand side of Parseval’s identity. 


We now actually have a way to identify every separable Hilbert space with the one that was introduced to us at 


the beginning of class: 


Theorem 165 


If H is an infinite-dimensional separable Hilbert space, then H is isometrically isomorphic to £7. In other words, 


there exists a bijective (bounded) linear operator T : H — &? so that for all u,v € H, ||Tullg = |lul|y and 
(Tu, TV) = (U,V). 


(The finite-dimensional case is easier to deal with — we can show that those Hilbert spaces are isometrically 


isomorphic to C” for some n.) 


Proof sketch. Since H is a separable Hilbert space, it has an orthonormal basis {é€n}nen, and by Theorem 162, we 


must have 


for all u € H, which implies that 


So we'll define our map T via 


in other words, Tu is the sequence of coefficients showing up in the expansion by orthonormal basis, and this sequence 
is in £2 by the inequality we wrote down above. We can check that 7 indeed satisfies all of the necessary conditions 
— it’s linear in u, it’s surjective because every such sum }°°°., Cn€p is Cauchy in H, and it’s one-to-one because every 


u is expanded in this way, meaning that if two expansions are the same the evaluations of the infinite sums must also 


be the same. 


We can now use this theory that we've been discussing in a more concrete setting, focusing on the specific example 


of Fourier series. 


Proposition 166 


inx 
e 


The subset of functions 
= 


\ is an orthonormal subset of L?([—7, 7]). 
neZ 


(If we're uncomfortable working with complex exponentials, we can define e’ = cosx + jsinx and work out all of 


the necessary properties — everything that we expect for exponentials remains true.) 


Proof. Notice that 1 7 
(ain em) =, el™ eimx dy =f el(n—™)x dy 
—T —T 


1 
i(n—m) 


‘ TT 
eo = 0 because the exponential 
—TT 


is equal to 27 when n = m (since the integrand is 1) and otherwise it is 


is always 27-periodic x. So normalizing by 27 indeed gives us the desired 


elnx elm 1 m=n 
(Va ve) 0 mn. 


Definition 167 


For a function f € L?([—7, ]), the Fourier coefficient f(n) of f is given by 


x 1 us oe 
= —In 
f(n) = ae ‘a f(t)e ‘dt, 
and the Nth partial Fourier sum is 


SiC = Aa =) 


[n|<N [n|<N 


We can then look at the limit of the partial sums, but we're not going to make any claims about convergence here 


yet: 


Definition 168 


The Fourier series of f is the formal series >> <7 Fete 


The motivating question for Fourier when first studying these objects was whether or not all continuous functions 
could be expanded in this Fourier series manner. Trying to study things on a pointwise convergence level is difficult, 
but the space we should really be viewing this setup within is the L? space, and there we'll be able to get some results. 


The problem we're trying to resolve is as follows: 
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Problem 169 


Does the convergence (in L? norm) )\°~ 


_ fe'™ — f hold for all f € L?([—1, 1])? In other words, does 


ene ([ Ire - Sufto)fide) 


converge to 0 as N + co? 


einx 


V2n 


We'll rephrase this equivalent as follows: we want to know whether { } is a maximal subset in L?([—7, 7]), 
which is equivalent to showing that 


fins VWaeZ => FSU: 


We already know that if we have an orthonormal basis, then we can indeed make this infinite expansion for any element 
of the space L?. So this rephrasing in terms of the language of Hilbert spaces will help us out here (and we should 
remember that we require the completeness of L? to get to this rephrased problem statement). The answer to our 


problem turns out to be yes, but it'll take us a bit of work to get there. 


Proposition 170 
For all f € L?([—7, 7]) and all N € Zso, we have Syf(x) = fies Dn(x — t)f(t)dt, where 


Du) =) sn((w 8) 


¥ x 
2m sin 5 


We can check that the function Dy is continuous (and in fact smooth), and it is called the Dirichlet kernel. The 


proof of this will be basically a warm-up calculation in preparation for some other calculations to come: 
Proof. For any f € L?({—7, 7]), we know that 
1 7 sat T 1 ei 
Syf(x) = — nije dca = f(t) | — ENT dt: 
n=O (=f rwemar) em =f" ro (= 
[nl SN [nl SN 


by linearity of the Lebesgue integral (even though we're using the Riemann notation, integrals are always Lebesgue 


here). The term in parentheses is the function Dy(x — t), where 
1 at Ae! 
Dn(x) = az S- er = soe ew, 
|n|<N n=0 
This is a geometric series with ratio e'*, so this evaluates to 


1 is i el(2N+1)x 


QT 1-— ex 


whenever e’ 41. That happens whenever x 4 0 (this is the only value within the range (—27, 7) that it needs to be 


2N+1 


aq» 90 now for the x # 0 case, we can simplify 


defined on), and when x = 0 the original geometric series is clearly 


this expression some more to 
1 el(Nt1/2)x _ 9 i(N41/2)x 


on elx/2 — e-ix/2 
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. IX xX * z 
and now because sin x = en we can rewrite this as 


1 2isin((N + 3) x) 
Qn 2isin 5 


and canceling out the 2/s gives us the desired expression for Dy above. 


Definition 171 
Let f € L?([-7,7]). The Nth Cesaro-Fourier mean of f is 


N 
1 
onf(x) = Ter isc: 
k=0 


We've rephrase our convergence of Fourier series to the statement that “if the Fourier coefficients are all zero, 
then the function is zero.” And the direction we're going with this definition here is that if we can show the partial 
sums Syf converge to f, then f must be a sum of zeros, but trying to do this with Sy directly is our original problem 
statement! So this “averaged” Cesaro-Fourier mean will be an easier thing to work with, and we'll try to show that 


onf — f instead. 


Remark 172. We do know from real analysis that the Cesaro means of a sequence of real numbers behave better 
than the original sequence, but we don't lose any information, so we have some expectation of getting better behavior 


here as well. In particular, sequences like {1,—1,1,—1,---} do not converge, but their Cesaro means do. 


So next time, we'll discuss more why this convergence works: we'll show that for every f € L?, onf converges to 
f in L?. That would then show the desired result, because if all of the Fourier coefficients are zero, then of is zero 


for each N, and thus the limit of those functions is also the zero function. 


16 April 15, 2021 


We'll continue the discussion of Fourier series today — last time, we defined the Fourier coefficients 


1 


f(n) = =f. F(t)e at 


for any function f € L?([—7, 7]), which we can think of as the L? inner product of f with e~’"* up to a constant. 
Defining the Nth partial sums 
N 
SnF(x) = S> F(m)e™, 
n=—N 

we wanted to know whether Swf always converges to f in L? — that is, whether for all f € L?([—7, m]) we have 
liMpysoo I|f — SwFll2 = 0. 

Based on our discussion of Hilbert spaces, this question is equivalent to asking whether a function f € L?({—7, 7]) 
with all Fourier coefficients zero must be the zero function (since we're trying to ask whether { Feeim} is a 

aM néeZ 


maximal orthonormal subset). Our main step last time was to define the Cesaro-Fourier mean 


N 
1 
onf (x) = TES », Sx F(x), 
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hoping that means of sequences converge better than the sequences themselves. Our goal is then to show that 
\lonf — f||2 + 0 as N > oo, and that will give us the desired convergence result for Fourier series. 


We'll first rewrite the partial Fourier sums slightly differently, much like how we previously used the Dirichlet kernel: 


Proposition 173 


For all f € L?([—-7, 7]), we have 


N+1 


ont (x) =f MC SUR OWs ioe Ece 


<0 


al 


aa In(N 41) 


2) 
) otherwise. 


ex 
sin 5 


The function K(x) is called the Fejér kernel, and it has the following properties: (1) Ky(x) > 0 and Ky(x) = 
Ky(—x) for all x, (2) Ky is periodic with period 27, (3) [°, Ky(t)dt = 1, and (4) for any 6 € (0, 77) and for all 


af 
6 < |x| << T, we have |Kn(x)| < 2n(N+1) sin? 5° 


The idea is that the Fejér kernel grows more and more concentrated at the origin as N — oo, but the area of the 


curve is always 1 (like the physics Dirac delta function) — here’s a picture for N = 8: 


The reason we might believe that these Cesaro means converge to f is that 
Tv 
onf (x) =F Kn(x — t)f(t)dt, 
—T 
and Ky is very sharply peaked around t = x, so as N gets larger and larger, the main contribution to the integral 


comes from f(x) & f(t) if f is well-behaved enough. So then we end up with 


ms rx) | Ky(x — t)dt = f(x) -1, 


since ky evaluates to the same over any interval of length 27 by periodicity. So that's a heuristic motivation for 
working with the Cesaro means here! (Some of these properties also applied when we did a similar procedure with our 
partial sums Syf(x), but the Dirichlet kernel is not nonnegative — that difference actually makes a big difference in 


the final proof.) 


Proof. Recall that 7 
Sf (x) =| D,(x — t)f(t)dt 


—T 


for the Dirichlet kernel 


2N41 _ 
D,(t) _ on ; t 0 
say sin Nata) 2) otherwise 
20 sin § . 
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We can use this fact to find that 


N T 
ates = aT SF) =| om = Dil LF (t)dt, 
k=0 


and thus we know that the desired kernel is 


N 
1 
k=0 


We can now substitute in our expression for Dx, using the variable x instead of x — t. The case x = 0 can be done 


easily (we just have constants), and for all other x we can slightly rewrite our expression as 


sto sabe sen de(43)9) 


By the trig product-to-sum Identity, this simplifies to 


CE 1) 2 ee xP? 3 cos(kx) — cos ((k + 1)x), 
k=0 


and this is a telescoping sum which simplifies to 


1 1 
~ On(N AL). (sin 3) 


5 (1—cos((N +.1)x)). 


1—cos x 
2 


We can now use another trig formula = Cos 2 (x) to get 


= 1 1 5x0 (“ + x) 
2m(N + 1) (sin ae 2 
which is indeed the expression for our Fejér kernel. 

We can now verify the properties of the Fejér kernel directly: (1) is true because we have a manifestly positive 
expression and sin?(cx) is even, and (2) is true because sin? is also periodic with half the period of the corresponding 
sin. For (3), notice that 

cE Dx (t)dt =i 5 edt, 
- T n=—k 
and the integral of e'”’ is zero unless n = 0 (by 27-periodicity), so we just pick up the n = 0 term and get 1. Since 
own Is the average of the Dxs, the integral of oy Is also the average of the average of the Dxs, which will also be 1. 
Finally, for (4), notice that sin? 3 Is an even function which is increasing on [0, 7]. So if we pick some 6 € (0,7), 


we can say that 


6 
6<|x|\<7 = sin? > > sin? 5, 
so we indeed get the expected 
1 N+1 1 
Kw(x) = |Kn(x)| < -s i? ( ) Bs 
2n(N + 1) sin? 4 2 2m(N + 1)sin* 5 


Now, we can prove convergence of the Cesaro means of to f by first doing it for continuous functions — we 
showed that the continuous functions with endpoints 0 are dense in L? (so we can show convergence appropriately), 


and continuous functions with endpoints both O can indeed be treated as 27-periodic. So the subspace of 27-periodic 


ai 


continuous functions is dense in L?, and we'll consider this dense subset first because it’s where the heuristic argument 


we made above applies rigorously. 


Theorem 174 (Fejér) 


Let f € C([-7, m]) be 27-periodic (so f(—m) = f(m)). Then oyf — f uniformly on [—7, 7]. 


In other words, we have an even stronger result than L? convergence, now that we're limiting ourselves to continuous 
functions and have the stronger uniform norm. But this does not imply that the Fourier series of f converges pointwise 
to f — there are indeed Fourier series representations of continuous functions that diverge at a point. Instead, it’s the 


Cesaro mean and the Fejér kernel that help us out here! 


Proof. First, we extend f to all of R by periodicity (defining it so that f(x + 27) = f(x) for all x € R). Our function 
is then an element of C(R) (still continuous), and it is 27-periodic, so it is uniformly continuous and bounded on all 
of R (that is, ||F]|o = SUPxef—-a,n] F(X) < 00). 
We wish to show that o,f converge uniformly on f, which means that for all € > 0 we need to find an M so that 
for all n > M, we have |onf(x) — f(x)| < € for all x. Indeed, for any € > 0, by uniform continuity of f, there exists 
E 


some 6 > 0 so that for all |y — z| < 6, we have |f(y) — f(z)| < §. So now we can choose M € N so that for all 


N > M, we have 
2||Flloo E 


2 6 7 
(N-E l)ysin- > 2 


(we can do this because the left-hand side converges to 0 as N — oo). Now because f and Ky are 27-periodic, we 
can write the Cesaro mean as 


onf(x) = a Kn(x — t)f(t)dt = [- Kn(T)f(x — T)dt 


by a change of variables (which is allowed because we're doing integrals over continuous functions, and thus we can 
use the Riemann integral), and now we have the product of 27-periodic functions, so the integral of that is the same 
over any interval of length 27: switching back to t from T, 


= He Ky(t)f(x — t)dt. 


nv 
We can now say that for all WV > M and for all x € [m, 7], we have 
Tv aT: 
lonf(x) — f(x)| = / Kyn(t)f(x — rade f Ku(t)rtooar| 
—T —T 


where we've added in a ff. Kyn(t)dt integral to the f(x) term, which is okay because f(x) doesn’t talk to the 


t-integral. Combining the integrals by linearity gives us 
Tv 
= / Kn(t) (f(x — t) — f(x)) a : 
—T 
We'll use the triangle inequality and then split this integral into two parts now: 


< [ tKw(e) FG t) — F(x))| dt = I. Ky(t) F(x — t) — F0X)| a+ f 


§<|tl< 


Ky(t) |f(x — t) — f(x) dt 


(also using the fact that Ky is always nonnegative). And now we can use our bounds above to simplify this: for the 
first term, we know that |(x — t) — x| < 6 over the bounds of integration, so |f(x — t) — f(x)| < 5. And for the second 


term, we know that |f(x — t) — f(x)| < 2||fl|o. because both f(x — t) and f(x) have magnitude at most ||f||,. for a 
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continuous function, and when |t| > 6 we can use condition (4) of the Fejér kernel. Putting this all together, we find 


the inequality 
21I Flo 


E 
<a Ky(t)dt 4 ; | Ky(t)dt. 
2 = v(t) 2m(N + 1) sin? g 5<|tl<om v(t) 


We can now bound both integrals here by the integral over the entire region to get 


€ Alf lloo —€ € 


=57 2 Oo 
2 (N+ 1)}sin" 5 2 


by our choice of N. So we've indeed shown uniform convergence — oyf is eventually close enough to f for large 


enough N — and we're done. 


Remark 175. This same proof can be modified if instead of knowing that K,(x) > 0 (which we know for the Fejér 


kernel), we have that 
Tv 
sup [ |Ky(x)| < oo. 
N df = 


Tv 
Then we can show the same uniform convergence by modifying our proof above. But if we try to plug in our Dirichlet 


kernel here, the condition is not satisfied, since 
TT 
/ |Dy(x)|dx ~ log N. 
—T 
So having “almost all of the properties” isn’t enough for us to get the analogous results for the Dirichlet kernel! 


Now that we've proven that the Cesaro means of a continuous function converge uniformly to that function, we 
want to show that the Cesaro means of an L? function converge to an L? function, which would show the condition 
on the Hilbert space that we want and show convergence of the Fourier series as well. We'll first need the following 


result: 


Proposition 176 


For all f € L?([—7, m]), we have |loyfll2 < ||Fl|o. 


Proof. We'll first do this for 27-periodic functions. First suppose that f € C([—7, 7]) is 27-periodic — extend f to all 


of R as before, and then the Cesaro mean is owf(x) = je f(x — t)Ky(t)dt. Thus, we can write out 


Tv Tv wv wv 
llowF |S | = F lowF(x)|?dx = / / / f(x — s)f(x — t)Ky(s)Ky(t)dsdtdx. 
—TT —T —T —T 
All of these functions are continuous, so we can change the order of integration by Fubini’s theorem to get 
Tv Tv Tv 
=a, / Kn(s)Kn(t) / f(x — s)f(x — t)dx]} dsdt. 
TT —T —T 
By Cauchy-Schwarz, this can be bounded by 
Tv as 
< ff ku(s)kw(2llF-s)lblltC - Dlledsde, 
IN. =I. 


where f(- — s) denotes the function that maps x +> f(x — s). And now we're integrating a periodic function f(- — s) 
over an interval of length 27, so we can replace that expression with ||f||2 Gust shifting to another length 27 interval). 


Doing the same with f(- — t) gives us 


Tv Tv 
= #18 f / Ky(s)Ky(t)dsat =| [F112 | 
—t —i 
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because the integral of Ky is 1. This gives us the desired inequality for 27-periodic functions, and now to extend it to 
all functions in L?, suppose we have some general f € L?. From exercises, we know that there exists a sequence {f,}, 
of 27-periodic continuous functions that converge to f in L?, meaning that ||f, — f||2 > 0. So from the definition of 


the Cesaro means, this means that ||owf; — owf||2 + 0 for any fixed N and as N — ov, leading us to 


lon f|l2 = lim |lowfrll2 < lim ||fall2 
n-oo nN-oo 


(using the 2m-periodic case), and this last result is ||f||> because f, converges to f in L?. 


So now we're almost done, and combining the two results above will give us what we want: 


Theorem 177 
For all f € L?, |loyf — fl|2 3 0 as N > oo. Therefore, if f(n) = 0 for all n, then f = 0 (since oyf = 0 for all 


N). 


Proof. (We only need to prove the result in the first sentence — the second follows directly as stated.) Let f € 
L?({-72, m]), and let € > 0. By density of the 27-periodic continuous functions, there exists some 27-periodic 


g € C([-7, m]) so that ||f — gll2 < §. Because oyg — g uniformly on [—7, 7], there exists some M so that for all 


E 


3V2T° 


N > M and for all x € [—7, m], we have |ayg(x) — g(x)| < 
Now for all NM > M, the triangle inequality tells us that 


lonf — fllo < |lonf — onglle + llong — glo +|l9 — fille. 


The first term is ||oy(f — g)||2 (we can check this from the definition), and by Proposition 176, that is less than 
I|f — gll2 < 3. Meanwhile, the last term is also bounded by 3, and the middle term is (J, lowg(x) - g(x)|2dx) 7 < 


a\ 1/2 
(2n. (st) ) = §. So putting this all back into our expression gives us 


E 
llonf flla< 34 +—=€6, 


completing the proof. 


So we've now seen a concrete application of the general machinery we've built up for Hilbert spaces! In summary, 
we've shown that the normalized exponentials form a maximal orthonormal set, so that the partial Fourier sums of f 
converge to f in L?. But as previous mentioned, we don’t have pointwise convergence everywhere — instead, we can 
only say that there is a subsequence that converges to f pointwise. And in fact, Carleson’s theorem is a deep result 
in analysis that tells us that for all f € L?, Syf(x) > f(x) almost everywhere. 

We can also ask questions about the convergence of Fourier series in other L? spaces, since all of the definitions 
also make sense there. It is known additionally that for all 1 < p < co, we always have ||Sjwf — f||, — 0, and that 
this is false for p = 1,00. But deeper harmonic analysis is needed to prove statements like this, and in particular we 
would need to learn how to work with singular integral operators. 

In this class, though, this is as far as we'll go with Fourier series, and next time, we'll move on to the topic of 
minimizers over closed convex sets and (as a consequence) how to identify the dual of a Hilbert space with the 


Hilbert space itself in a canonical way. 
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17 April 22, 2021 


Last time, we discussed orthonormal bases, considering the concrete question of whether complex exponentials formed 
an orthonormal basis for L?([—7, 1]). Today, we'll go back to a general discussion of Hilbert spaces, and the rest of 
the course from here on will be general theory and some concrete applications to particular problems. 

Our first topic today will be length minimizers: recall that we can describe a norm on V/W for subspaces of a 
normed vector space, and we did so via an infimum. It makes sense to ask whether this minimal distance is actually 


achieved: 


Theorem 178 


Let C be a nonempty closed subset of a Hilbert space H which is convex, meaning that for all v;, vo € C, we have 


tv, + (1 — t)vo € C for all t € [0,1]. Then there exists a unique element v € C with ||v|] = infyec |[u|| (this is a 


length minimizer). 


The convexity condition can alternatively be stated as “the line segment between any two elements of C is contained 
in C.” And to connect this with our discussion earlier, one such example of a set would be v + W for some closed 


subspace W of C and some v € H. 


Remark 179. The condition that C is closed is required: for example, we can let C be an open disk outside the origin, 
in which case the minimum norm is not achieved (because it’s on the boundary). And convexity is also required — for 
example, otherwise we could take the complement of an open disk centered at the origin, in which case the minimum 


norm is achieved on the entire boundary. 


Proof. We should recall that a = inf S if and only if a is a lower bound for S, and there exists a sequence {s,} in S 
with sp — a. If we let d = infec |u|], this is some finite number because norms are always bounded from below by 0, 
and C is nonempty. So there exists some sequence {Un} in C such that ||up|| > d. 

We claim that this sequence is actually Cauchy. To see that, let € > 0 — because of convergence of ||U,|| to d, 
there exists some WN so that for all n > N, we have 


e2 
2|)un||? < 2d? + =. 


Then for all n,m > N, the parallelogram law tells us that 


2 
Un + Um 


[|Um — Un||? = 2||Um| |? + 2||unl|? — 4 9 


and now because Hat lies on the line segment between Up and Um (taking t = 5), convexity tells us that It is also in 
2 
C. Therefore, | > d?, and thus 


Un+Um 
2 


2. 2 


€ é 
||Um — Unll? < 2||um|l? — 2d? + 2||unl|? 2d? < 9 9D = & 


by our choice of N, and taking a square root shows that the sequence {u,} is indeed Cauchy. Because our Hilbert 
space is complete, this means that the sequence also converges, and thus there is some v € H such that U, — v, and 


v € C as well because our subset C is closed. So now continuity of the norm tells us that 
I|v|| = lim |[un|| = d, 
n—-oo 


and thus we've found our minimizer v € C. To show uniqueness, suppose that v,V are both in C and have norm d. 
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Then the parallelogram law tells us that 


2 
< 2d? + 2d? — 4d? =0, 


= _ V+V 
Iv — i = alia? + ai? —4|| 


again using that a is also in C by convexity, and thus we must have v—-vV=0 = > v=v. 


We'll obtain some important consequences from this result — the first one is how to decompose our Hilbert space 


using a closed linear subspace, much like we usually like to do in R” and C”. 


Theorem 180 
Let H be a Hilbert space, and let W C H be a subspace. Then the orthogonal complement 


W+={ueEH:(u,w)=0 Vwew} 


is a closed linear subspace of H. Furthermore, if W is closed, then H =W @W+#: in other words, for all u € H, 


we can write u = w+ wt for some unique w € W and wt € W+.) 


A picture to keep in mind is the case where H is R? and W is the x-axis — then W+ would be the y-axis, and we're 


saying that all elements can be broken up into a component along the x-axis and a component along the y-axis. 


Proof. Showing that W+ is a subspace is clear, because if (u;,w) = 0 and (uz, w) = 0 for all w € W, any linear 
combination of uy and up will also be orthogonal to all w € W by linearity of the inner product. Furthermore, 
WnW+ = {0}, because any element w € W that is also in W+ must satisfy (w,w) =0 => w=O. 

To show that W+ is closed, let {u,} be a sequence in W+ converging to u € H. We wish to show that (u, w) = 0 
for all w € W, so that u € W+ as well. Indeed, by continuity of the inner product, we have 


(u,w) = lim (Up, w) = lim 0=0, 
n->oo n-oo 


so that our sequential limit is also in our subspace W+. 
It remains to show that H = W@W + if W is closed. The result is clear for W = H, since Wt = {0} and 
H = H@ {0} is a trivial decomposition. Otherwise, if W #4 H, then let u € H\ W (that is, u is in H but not W), and 
define the set 
C=u+W={u+w:wew}. 


This set C is nonempty because it contains u, and it is convex because for any two elements u + w1,u + wo € C (for 


W1, W2 € W) and for any t € [0, 1], we have 


t(u+ wm) + (1— t)(ut we) = (t+ (1 t))ut+ tm + (1— t)we =u + (tw + (1 — t) we) 


and the last term is in W because subspaces are closed under linear combinations. So we now need to show that C is 


closed: indeed, if u-+ Wp, is a sequence of elements in C that converge to some element v € H, we know that 
Ut+W, OV = WovV—-d, 


and because W is closed, w, must converge to some element in W. Thus v — u € W, and thus v = u+ w for some 
w € W, which is exactly the definition of being in C. So C is indeed closed. 
So returning to the problem, if we want to write an element u of H as a sum of a part in W and a part in W-, it 


makes sense that our component in W+ will be the minimizer to C (keeping the IR? example from above in mind). So 
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applying Theorem 178, because C is closed and convex, there is some unique v € C with 
[vj] = inf ||[c]] = inf |ju+w]|f. 
cec wew 


. Since v € C, we know that u— v € W, so we will write u = (u— v)+v. Our goal is to show that v € Wt, and we 
do this with a variational argument (in physics, this is the Euler-Lagrange equations, and it is another way of phrasing 


properties of the infimum). If w € W, define the function 
f(t) = ||v + tw]||? = ||v||? + t?||w]|? + 2tRe(v, w), 


which is a polynomial in t. We know that f(t) has a minimum at t = 0, because all elements of the form v + tw are 


in C, and we know the minimizer of norm uniquely occurs at v. So f’(0) = 0, and thus 
2Re(v, w) = 0. 


So the real part of the inner product is zero, and now we can repeat this argument but with ||v + /tw]| instead of 
\|v + tw||, which will show us that 


Re(v, jw) = Im(v, w) = 0. 


Therefore, (v, w) = 0, and since this argument was true for all w € W, we must have v € W+. It remains to show that 


this decomposition is unique, and this is true because WM W+ = {0}: more specifically, if u = w, + we =Wot+ we, 


AL alt 


that means that wy — w2 = ws — wz is in both W and W+, and thus both sides of this equation are 0. So wy = Wo 


at a: 
and wz- = ws, showing uniqueness. 


The following result is left as an exercise for us: 


Theorem 181 


If W Cc His a subspace, then (W+)+ is the closure W of W. In particular, if W is closed, then (W+)+ = W. 


Now that we have this decomposition u = w+ w+ for our subspace W, we can construct a map which takes in u 
and outputs w. If we use the R? example from above, we can see that this map is a projection onto the x-axis, and 


more generally we can make the following definition: 


Definition 182 
Let P: H— H bea bounded linear operator. Then P is a projection if P? = P. 


Proposition 183 
Let H be a Hilbert space, and let W C H be a closed subspace. Then the map My : H > H sending v = w+ wt 


(for w €W, wt =W+) to w is a projection operator. 


Proof. First, we show that My is linear. Indeed, if vz, = w, + we and Vo = Wo + we, and we have Aj, Ao € C, then 
Avy + Azve = (AiWi + A2We) + (Arwt + Azws). 


The two terms on the right-hand side are in W and W, respectively, by closure of subspaces under linear combinations. 
So Nw(Aivi + A2Vv2) = A1Wy + A2We, which is indeed AiPyw(v1) + A2Mw(v2), as desired. We can also see that Ny is 
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bounded, because when v= w-+w+4, 
[IVI = [w+ we ||? = [wl]? + [will]? > Ill? 


(since the inner product cross term is zero when (w, w+) = 0). Therefore, ||Myw(v)|| < |{v||, and the operator norm 


is at most 1. And now we just need to check that May =Nwy: if v=w+wt, then Nw(v) = w, and then 


Nw(Nw(v)) =Nw(w) =w=Nwy(v), 


and since this is true for all v, we have Bin = ly, as desired. 


Our next application of length minimizers will be the following important result: 


Theorem 184 (Riesz Representation Theorem) 


Let H be a Hilbert space. Then for all f € H’, there exists a unique v € H so that f(u) = (u, v) for all u € H. 


In other words, every element of the dual can be realized as an inner product with a fixed vector. We've seen 
something similar before when we proved that the dual of @? is identified with 29 (for - + - = 2) via a pairing, and 


the p = gq = 2 case is the example relevant to Hilbert spaces. 


Proof. If such a v exists, it is unique, because f(u) = (u,v) = (u, 7) = 0, then (u,v —V) = 0 for all u € H. Setting 
u=v-—V tells us that v —V = 0. So we just need to construct such a v that works. 
The easiest case is f = 0, because in that case, we take v = 0. Otherwise, there exists some u, € H so that 


f(u,) #0, and we take up = Fun) so that f(Uo) = 1. We can then define the nonempty set 


C={ue HH: f(u)=1} =F *(1}), 


which is closed because f is a continuous function, {1} only has one element so is closed, and the preimage of a closed 


set by a continuous function is a closed set. We claim that C is convex: indeed, if uy, U2 € C and t € [0,1], then 
f(tu, + (1 — t)uo) = tf(u.)+(1—-t)f(u) =t-14+(1-t)-1=1, 


so that tu, + (1 —t)up is also in C. So now by Theorem 178, there exists vo € C so that vo = infyec ||u||, and we 


define v = ele (noting that vo 4 0 because the infimum is not 0). 


We claim that this is the v that we want; in other words, let’s check that f(u) = (u,v). Indeed, if we let 
N = f-1({0}) = {w € H: f(w) = 0} be the nullspace of f, then we can check that C = {vo + w: w € N} and that 
|| Vol] = infwen ||Vo + w||. So by the argument that we made earlier in Theorem 180 using ||vo + tw]||?, vo € N+, and 
now for any w € H, 
f(u — f(u)vo) = f(u) — F(u)F(vo) = 0 


by linearity of f, and thus u = (u — f(u)vo) + f(u)Vvo is a sum of a component in N and a component in N+. 


(U, Vo) = 


(u,v) = P [((u — f(u)vo), Vo) + F(u) (Vo, Vo], 


1 
|| vol |? || Vol 
The first term here has u — f(u)vo € N and vo € N+, so that inner product is zero, and we're left with 


(Vo, Yo) 
|| vol |? 


= f(u) = f(u), 
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as desired. So we've found v (a scaled version of the minimizer) so that f(u) = (u,v) for all u, concluding the 


proof. 


We'll study adjoint operators next time — we defined it as a map from dual spaces to dual spaces, but because we 
can identify dual spaces of Hilbert spaces with themselves, adjoint operators will be essentially regular operators, and 
we'll soon see how they relate to solving equations on Hilbert spaces and why they are the analogs of the transpose 


matrix in finite-dimensional linear algebra as well. 


18 April 27, 2021 


We discussed the Riesz representation theorem last time, which states that for a Hilbert space H, we can identify 
each f € H’ = B(H,C) with a unique element v € H such that f(u) = (u,v) for all u € H. (In other words, every 
continuous linear functional on H can be realized as an inner product with a fixed vector.) 


We can use this to expand on a concept we've touched on previously in an assignment: 


Theorem 185 
Let H be a Hilbert space, and let A: H — H be a bounded linear operator. Then there exists a unique bounded 
linear operator A* : H — H, known as the adjoint of A, satisfying 


(Au, v) = (u, A*v) 


for all u,v € H. In addition, we have that ||A*|| = ||Al]. 


Proof. We can show uniqueness similarly to how we showed it in the Riesz representation theorem: if (u,Ajv) = 
(u, Asv) for all u, v for two potential candidates Ay, Ao, then (u, (Ajv — Av)) = 0 for all u,v, and we can always set 
u = (Ajv — Aév) to show that we must have Ajv = Av for all v, meaning that Aj and AS were the same operator 
to begin with. 

To show that such an operator does exist, first fix v € H, and define a map f, : H > C by f,(u) = (Au, v). This 


is a linear map (in the argument u) because for any uy, U2 € H and Aq, A2 € C, we have 
fy(Arun + Azu2) = (A(Ar un + Azu2), v) = (ArAui + A2Aup, Vv) 
by linearity of A, and then this simplifies to 
= ri (Aur, v) + A2(Aue, v) = Arf (ur) + Ash (u2) 


by linearity in the first argument of the inner product. We claim this is also a continuous linear operator (so that it is 


actually an element of the dual). Indeed, we can check that if ||u|| = 1, 
|f.(u)| = |(Au, v)| < |[Aul] - [lvl] 


by the Cauchy-Schwarz inequality, and this is bounded by ||Al|-||v||. Therefore, ||f/|| < ||Al|-||v]| (which is a constant), 
and thus fy € H’. By the Riesz representation theorem, we can therefore find a (unique) element, which we denote 
A*v, of H satisfying 

(Au, v) = f,(u) = (u, A*v). 
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We now need to show that A* is a bounded linear operator. For linearity, let vz, v2 € H and let A1, Ao € C. We know 
that for all u € H, 


(u, A*(Aiv1 + A2v2)) | = (Au, Arva + Azv2), 


and now by conjugate linearity in the second variable, this simplifies to 
= )1 (Au, v4) + A2(Au, vo) = Az (u, A* v1) + Ao(u, A* vo). 


Pulling the complex numbers back in shows that this is 


=] (u, AA‘ Vy + A2A* vo) |. 


The only way for these two boxed expressions to be equal for all u is if the two operators are equal: A*(A1v1 + A2ve2) = 
1 A* (V1) + A2A* (v2), which is the desired linearity result for A*. 

We now show that A* is bounded with ||A*|| = ||A||. Take a unit-norm vector ||v|| = 1: if A*v = 0, then clearly 
\|A*v|| < ||A|]. Otherwise, we still want to show that same inequality. Suppose A*v 4 0. Then 


\|A*v||? | = (Atv, Atv) = (AAt*y, v) 


by definition of the adjoint, and now by Cauchy-Schwarz this is bounded by 


< ||AA* II - Ivf = AA* I] <j TAI IA*vIl 


Dividing by the nonzero constant ||A*v|| yields ||A*v|| < ||Al|, as desired, and now taking the sup over all v with 
I|v|| = 1 yields ||A*|| < ||Al]. 


To finish, we need to show equality. For all u, v € H, we have 


(A*u, v) = (v, A*tu) = (Av, u) = (u, Av), 


so the adjoint of the adjoint of A is A itself (since (u, Av) = (A*u, v) = (u, (A*)*v)). Therefore, we can flip the roles 
of A* and A in this argument to find that 


[AT SAT = IAI SHAT 


and putting the inequalities together yields ||A|| = ||A*|| as desired. 


Let's see a concrete example of what these adjoint operators look like: 


Example 186 


If our Hilbert space is H = C”, so that u is an n-dimensional vector, then we know that 


n 
(Au); = Sy Aijuj 
f= 


for some fixed complex numbers Ajj, and we can represent A as a finite-dimensional matrix. 


To determine the adjoint of A, we need to figure out the operator B that satisfies (Au, v) = (u, Bv). Towards 
that, notice that 


n 
(Au, v) = (Au) ivi = 35 Aijuyvi 
i=1 ij 
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and switching the order of summation yields 


3 
3 


where the adjoint of A acts on v as 
n 
(A*v); = So Aiiv. 
j=l 
So for matrices, the adjoint is also representable by a martix, and it is the conjugate transpose of A. 


Example 187 


Now consider the space £7, in which an operator is described with a double sequence {Ai} in C so that 


N WN 
Ds |Aijl? a lim oe |Aisl? < 00. 
Wd 


j=1 j=1 


Specifically, we define A : 2? — £ via 
co 
(Aa) = S- Aijaj. 
j=1 
We can check by the Cauchy-Schwarz inequality that this is a bounded linear operator as long as iy |Ail? is satisfied 
(the order of summation does not matter because all terms in the double sum are nonnegative). So A € B(é?, £7), 


and for all a, b € £2, we have 


(Aa, be = S> So Ajab = So aj (= Ab = (a, A*b), 


i 
where we define the adjoint similarly to in the finite-dimensional case: 
Co 
(A*b); = S> Ajibj. 
J=1 
Finally, we can try doing an integral instead of an infinite sum: 


Example 188 
Let K € C([0, 1] x [0, 1]), and define the map A: L?([0,1]) + L?([0, 1]) via 


Af (x) = [ K(x. y)F(y)ay. 


We can then check that the adjoint A* is defined as 


A* g(x) = | Ky, x)g(y)dy, 


so we're again flipping the indices and taking a complex conjugate. 
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Theorem 189 
Let H be a Hilbert space, and let A: H — H be a bounded linear operator. Then 


(Ran(A))> = Null(A*), 


where Ran(A) is the range of A (the set of all vectors of the form Au), and Null(A*) is the nullspace of A* (the 


set of all vectors for which A*u = 0). 


In particular, if we know that the range of A is a closed subspace, then always being able to solve Au = v 
(surjectivity) is equivalent to knowing that that the adjoint is one-to-one (injectivity), sine the range of A is then the 


orthogonal complement of the zero vector, which is the whole space. 


Proof. Note that v € Null(A*) if and only if (u, A*v) = 0 for all u € H, which is equivalent to (Au, v) = 0 for all 
u € H. So v is orthogonal to all elements in Ran(A), and that’s equivalent to saying that v € Ran(A)+. (All steps 


here go in both directions, so this shows the equivalence of the two sets.) 


This is essentially an infinite-dimensional version of rank-nullity, and we want to see if we can say similar things 
about the solutions to linear equations that we could in the finite-dimensional case (our input needs to satisfy certain 
linear relations, and then our final solution is unique up to a linear subspace). But before we get to that, these operators 
that we'll solve solvability for have particular important properties on bounded sequences. We take for granted that 
a bounded linear operator takes bounded sets to bounded sets in finite-dimensional spaces, and so we can find a 
convergent subsequence using Heine-Borel. So the point is that there is some compactness hidden in here in IR” and 
C”, so we need to study some facts about how compactness and Hilbert spaces before we can talk about solvability 


of equations. 


Definition 190 


Let X be a metric space. A subset K C X is compact if every sequence of elements in K has a subsequence 


converging to an element of K. 


Example 191 


By the Pigeonhole Principle, all finite subsets are compact. 


As just described, we also have the following result from real analysis: 


Theorem 192 (Heine-Borel) 
A subset K C R (also R” and C”) is compact if and only if K is closed and bounded. 


Examples on the real line include closed intervals and also the set {0} U {+ : n € N}. We know this doesn't hold 


for arbitrary metric spaces or even Banach spaces, and in fact it’s still not true for Hilbert spaces: 


Example 193 


Let H be an infinite-dimensional Hilbert space. Then the closed ball 


F={ueEH: |u| <1} 


is a closed and bounded set, but it is not compact. 
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This is because we can let {e,}°2, be a countably infinite orthonormal subset of H (it doesn’t need to be a basis), 


which we find by Gram-Schmidt, so that all elements e, are in F, but 
llen — ex|l? = |lenll? + llexl|? + 2Re(en, ex) = 2. 


So the distance between any two elements of the sequence is 2, so there is no convergent subsequence (since it cannot 
be Cauchy). 

Motivated by this, we know that all compact sets are closed and bounded, and thus we want to figure out an 
additional condition guarantees compactness for a Hilbert space (so that we can verify compactness without using the 
subsequence definition). And this is in fact related to something that we can discuss in 18.100B in a different context 


when thinking about the space of continuous functions. 


Definition 194 


Let H be a Hilbert space. A subset K C H has equi-small tails with respect to a countable orthonormal subset 


{en} if for all € > 0, there is some n > N so that for all v € K, we have 


Wiivenr =e. 


k>N 


We know that the sequence for any given v converges by Bessel’s inequality, so that the inequality above will 
eventually hold for some WN for each v. But this equi-small tails requirement is a more “uniform” condition on the rate 


of convergence — we need to be able to pick an N that works for all v € K at the same time. 


Example 195 
Any finite set K has equi-small tails with respect to any countable orthonormal subset (we can take the maximum 


of finitely many Ns). 


The motivation for this definition is that, as mentioned above, finite sets are always compact, so we should hope 
that this additional uniformity gives us compactness. We won't get to that result today, but here’s some more 


motivation for why this is the correct condition to add, building on the {O} U {2 : n € N} example from above: 


Theorem 196 


Let H be a Hilbert space, and let {v,}, be a convergent sequence with v, — v. If {e,} is a countable orthonormal 


subset, then K = {vp : 1 € N}U {v} is compact, and K has equi-small tails with respect to {e,}. 


Proof. Compactness will be left as an exercise for us. For equi-small tails, the idea is that for sufficiently large n, Vp 
will be close to v, so we can use v to take care of all but finitely many of the points in our sequence. Let € > 0: since 
Vn — v, there is some M € N so that for all n > M, we have ||V; — v|| < 5. We choose N large enough so that for 
this fixed v, . 
E 
dM ex)? + IRR gD Mv ex)? < >. 
(There are only finitely many terms here, and we can choose our N large enough so that it makes the n = 1 term 


smaller than SS the n = 2 term smaller than a= and so on.) We claim that this N uniformly bounds our tails: indeed, 


e2 
dedi? <> <2, 
k>N 
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and for all 1 <n < M—1 wealso have 
E 
Di lvns end? < >< e?. 
k>N 


So we Just need to check the condition for n > M: Bessel's inequality tells us that 


1/2 1/2 
(5: kon oF = (Site va +0907) 


k>N k>N 


and this is the £2 norm of the sum of two sequences indexed by k, so by the triangle inequality this is boudned by 


1/2 1/2 
<(Sim— v.00?) + (Sine) 


k>N k>N 


The second term is at most §, and then the first term is bounded by Bessel’s inequality by ||v, — v||. Since we chose 


E 


N large enough so that that norm is less than 5, we indeed have that this is bounded by 


as desired. 


Next time, we'll prove that if we have a subset of a separable Hilbert space which is closed, bounded, and has 
equi-small tails with respect to an orthonormal basis (which we know exists), then we have compactness, and then 


we'll rephrase that fact in a way that doesn’t involve Hilbert spaces and go from there. 
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We'll continue discussing compactness today — recall that a subset K of a metric space X is compact if every sequence 
{xn}n in K has a subsequence that is convergent in K. While being closed and bounded is equivalent to being compact 
in IR”, this is not true in general Hilbert spaces (for example, take the orthonormal basis vectors in £?). So we need an 
additional condition — last time, we proved that if we have a convergent sequence {v,}, in H converging to v, then the 
subset K = {v,: 2 € N}U {v} is compact, and it has equi-small tails with respect to any orthonormal subset. Here, 
the definition is that if {e,}, is a countable orthonormal subset of H, then for all € > 0, there exists some N € N such 


that for all 7 € K (either an element of the sequence or v), we have 


So MU, ex) [? < €?. 


k>N 


(We know that this sum over all k is bounded, and thus convergent, by Bessel’s inequality for any individual 7, so we 
can always find an N that makes this work for a fixed V, but the condition requires it simultaneously for all 7 € K. 
So we can think of “equi-small tails” as really meaning “uniformly small tails.”) It turns out that this condition suffices 


(and is necessary) for compactness: 


Theorem 197 


Let H be a separable Hilbert space, and let {e,}, be an orthonormal basis of H. Then a subset K C H is compact 


if and only if K is closed, bounded, and has equi-small tails with respect to {e,}. 


Proof. For the forward direction, first suppose that K is compact. We know by general metric space theory that K is 


then closed and bounded, and we'll show that K has equi-small tails with respect to {ex} by contradiction. Suppose 
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otherwise: then there exists some €9 such that for each natural NV, there is some uy € K such that 


S I(uw, ex)? > €. 

k>N 
This then gives us a sequence {u,} (by picking such a uy for every natural number NV), and thus by the assumption of 
compactness, there is some subsequence {Vv} = {Up,,} and some v € K such that vp; — v. But we also know that 
for all 79 EN, Soya, | (Vn, €x)| > €3, because Vm = Un,, is the nth or later term of the original sequence (so summing 
over k > Nm is at most the value we get summing over k > n). That means that the subset {vp : 2 © N}U {v} does 
not have equi-small tails, which is a contradiction of our previous theorem. So if K is compact, then it must have 
equi-small tails, as desired. 

On the other hand, suppose K is closed, bounded, and has equi-small tails. We wish to show that any sequence 
{un} has a convergent subsequence in K. Because K is closed, any sequence that converges will converge in K, so 
we just need to show that there is some convergent subsequence. We know that any bounded sequence of complex 
numbers has a convergent subsequence (showing convergence of the real and imaginary parts by Bolzano-Weierstrass), 
so the plan is to expand {u,} in terms of the orthonormal basis of H and think about the coefficients along each basis 
vector. Since K is bounded, there is some C > 0 (only depending on K) so that for all n, ||Un|| < C. Therefore, for 


all k and for all n, the “Fourier coefficient” 
(Uns €k)| < [unl] - lexi] < CG 


and thus for each fixed k, we get a bounded sequence of coefficients along the kth basis vector: specifically, we have 


the bounded sequence of numbers 


{ (Un. €k) }n 


in C. Thus, by Bolzano-Weierstrass (fixing k = 1), there is some subsequence {(Up,(j), €1) Of {(Un, 1) $n which 
converges in C (in other words, we have a subset of the original {uj}s in which the first entry converges). And 
now {(Un(j)s €) is still a bounded sequence, and thus by Bolzano-Weierstrass again we have a further subsequence 
{(Uno(j), €2) which converges. So we now have a subset of the original {uj}s in which the first and second entries both 
converge (since the first subsequence converges in the first entry, and thus any subsequence of it will also converge in 
the first entry). 

We can repeat this argument arbitrarily many times: further subsequences of the up,(j)s gives us a subsequence 
Un (j) such that {(Up,(j), €2) Converges, meaning that we have convergence along our sequence in the first @ entries. If 
we now define 

Ve = Un, (2) VEEN, 


then the {vg}, form a subsequence of the {Up},s with convergence in the Ath entry (for any fixed k) as £— oo. This 
on its own doesn't mean that the sequence converges, but here is where we will use the fact that K has equi-small tails. 


It suffices to show that {vg}¢ is Cauchy (because H is complete and K is closed). For any € > 0, having equi-small 


2 
E 
S72 lve, ex)? = a5 


k>n 


tails tells us that there is some N such that 


for all 2 € N. Now because the N sequences {(ve, e1)}g through {(ve, ev) }~g each converge, we can then find an M 
such that for all 2,m > M, 
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We claim that this M is the one that we want for our sequence {vg}: indeed, 


N 


1/2 
I|Ve — Vl] = i — Vins €k)|* + >, |(Ve — Vn, «0 


k=1 k>N 


(because we have an orthonormal basis, the norm squared is the sum of the Fourier coefficients). Now using the fact 
that /a+b< ./a+vb, we can bound this as 


N 
< ST [lve — Vins ex) [2]? + 


k=1 


1/2 
The tel 


k>N 


By our choice of M, the first term is at most £, and we can use the £2 triangle inequality for the second term, thinking 


of that second term as the difference of the sequences {(ve, ex) }, and {(Vm, ex) }x. Thus we have the bound 


S~ Ive, ex) I? 


k>N 


1/2 


1/2 
€ E Ee 
+ Elma Sa rata! 


2 
2 4 
k>N 


where the last inequality comes from how we chose N. So our subsequence is Cauchy, thus convergent, and thus K 


is compact. 


Example 198 


Let K be the set (not subspace) of sequences {ax}, in £@ satisfying |ax| < 2~* — this set is known as the Hilbert 


cube, and it is compact. 


It may seem unwieldy that we make this definition with respect to an orthonormal basis, but we can characterize 


compact sets in another way as well: 


Theorem 199 
A subset K C H is compact if and only if K is closed, bounded, and for all € > 0, there exists a finite-dimensional 


subspace W Cc H so that for all u € K, infwew ||u — w|| <e. 


In other words, our additional condition is that we can approximate the points in K by a finite-dimensional subspace. 
This proof also involves a similar “diagonal argument,” and notably it works for non-separable Hilbert spaces as well, 
but we can read about the proof on our own. This should be a believable result, because the equi-small tail condition 
we worked with in our previous proof was basically saying that we can approximate points in K by the first N vectors 
in our orthonormal basis (since the contribution from the other basis vectors is small). 

We'll now start to talk about various classes of operators, and we'll start with the simplest ones. From linear 
algebra, we know that matrices are operators in finite-dimensional vector spaces, and we can represent them with an 


array of numbers. We can now generalize that definition to our current setting. 


Fact 200 
From here on, H will be a Hilbert space, and we'll denote B(H, H) by B(H). 


Definition 201 


A bounded linear operator T € B(H) is a finite rank operator if the range of T (a subspace of H) is finite- 
dimensional. We denote this as T € R(H). 
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Example 202 


If H is a finite-dimensional Hilbert space, then every linear operator is of finite rank. For a more interesting 


example, for any positive integer n, the operator 


is a finite rank operator (because the image is spanned by the first n standard basis vectors). 


Proposition 203 
The set R(H) is a subspace of B(H). 


Proof. The range of a scalar multiple of an operator is the same as the original range, and the sum of two finite rank 


operators has range contained in the direct sum of the individual ranges (which is also finite-dimensional). 


We'll now prove that these finite rank operators are really like matrices: 


Theorem 204 
An operator T € B(H) is in R(H) if and only if there exists an orthonormal set {e,}£_, and an array of constants 


15a. such that 


Proof. The backwards direction is clear: if T has such a representation, then the range of T is contained in the span 


of the L vectors {e,,--- , e,} and is thus finite-dimensional. Now suppose that T is a finite rank operator. Then we 


can find an orthonormal basis {€,}/_, of the range of T, such that 


N 


zi yore ex) ek 


k=1 


(since Tu is in the range, it must be this particular combination of the orthonormal basis vectors). Now by the 


definition of the adjoint operator, we can rewrite this sum as 


N 
=u. rene = Sou, ee, 
k=1 


where we've define v, = T*é,. If we now apply the Gram-Schmidt process to the vectors {@1,--- ,@y,Vi.°°:, Vv}, 
we get an orthonormal subset {e1,--- , e, } with the same span as our original €;s and vjs. Thus, there exist constants 


aki, Dk SO that (expanding in terms of the new orthonormal subset) 


L L 
ek = So aie), Vk = > bye}. 
i=1 j=l 
Thus, substituting back in, 


Tu= 3 (>: aii) u, &) ei, 


ij=1 


and now the term in the inner parentheses is our desired cjj. 
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And with this characterization, we can now describe our finite rank linear operators more explicitly: for example, 


the nullspace of T contains the set of vectors orthogonal to all of the exs. 


Theorem 205 


If T € R(H), then T* € R(H), and for any A, B € B(H), ATB € R(H). 


In other words, #(H) is a “star-closed, two-sided ideal in the space of bounded linear operators” — it’s closed under 


two-sided multiplication and adjoints. 


Proof. We'll leave the closure under multiplication as an exercise: the main point is that if 7 has a finite-dimensional 
range, the range of AT is also finite-dimensional, and whatever happens with B doesn’t really matter. For closure 


under adjoints, if T is a finite rank operator, then we can write 


a 


Tu= * ci (u, 6) ei, 


(y= il 
and thus 


L 
(u, T*v) = (Tu, v) = (> cy (u, 6) ei, “) ‘ 


ij=1 


By linearity in the first entry, we can rewrite this inner product as 
=o aslu, @)(e, v), 
i 
and we can now use linearity to pull things into the second component instead: 
= («Dae a = (um 7) ; 
ij ij 


But since this Is true for all u € H, we've shown that 


(« Tv—- sa 3) =0 
ij 


for all u,v € H, and thus we must have T*v = ae 


we can recover the coefficients in terms of the coefficients for T by reindexing as 


iL 
= y GilV, |) Ei. 
ig=t 


Thus, the coefficients of the matrix governing T* are obtained by taking the conjugate transpose of the ones for 
Ts 


1 ci(Vv, 67) & for all v € H, and thus T* € R(H) as well: in fact, 


Since the set of finite rank linear operators is a subspace of the Banach space of bounded linear operators, which 
come with a norm, it makes sense to ask if the subspace of finite rank operators is closed (under the norm). In other 
words, if T, € R(H), and ||7, — T|| 4 0 as n> co, we want to know whether T € R(H). It turns out the answer is 


no: 
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Example 206 


Let T, : 22 4 @ be a sequence of operators defined as 


a 


ma 


We can imagine that the limit 7 of these operators is the infinite-dimensional “diagonal matrix” with entries 


(1,3, 4,-+-): specifically, defining 


oy io ea eee 
T= aig ge 
we can check that ||7 — Tyl| < a but T is not of finite rank (since T(ke,) = e, is in the range for each standard 


basis vector e,). In other words, the space of finite rank linear operators (which are nice because we can solve linear 
equations involving them using matrices) is not closed. But we still want to know about the closure of #(H), and the 


hope is that we still have a useful characterization: 


Definition 207 
An operator K € B(H) is a compact operator if K({u € H: ||u|| < 1}), the closure of the image of the unit ball 


under K, is compact. 


We'll show next time that K is a compact operator if and only if it is in the closure of R(H), meaning that there is 
a sequence of finite rank operators converging to K. These compact operators will come up in useful problems — for 
example, T in our example above is compact, and the inverse of many differential operators will turn out to be compact 
as well. And as a sanity check before we do the proof next time, finite rank operators are indeed compact operators, 
because the image of the unit ball will be a bounded subset of a finite-dimensional subspace, and thus the closure of 


that image is a closed and bounded subset of a finite-dimensional subspace, which is compact by Heine-Borel. 


20 May 4, 2021 


Last lecture, we introduced the concept of a compact operator: an operator A € B(H) (recall that H always denotes 
a Hilbert space) is compact if K({||u|| < 1}), the closure of the image of the closed unit ball, is compact in H. These 
operators came up in our discussion of limits of finite rank operators, and we'll show today that the set of compact 


operators is indeed the correct closure. 


Example 208 
al a das 


Some illustrative examples of compact operators include K : £2 > £? sending a = (a1, ao, a3,---) to (aos pon yi 


as well as T : L? + L? sending f(x) to ie K(x, y)f(y)dy for some continuous function K : [0,1] x [0,1] — R. 


The latter is particularly important because it comes up in solutions to differential equations: if we take 


(x=Dy Os ysx<1, 
K(x, y) = 
xyod) Us xe y= 1, 


then we can check that u(x) = i K(x, y)f(y)dy satisfies the differential equation uv” = f, u(0) = u(1) = 0. 
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Example 209 


In contrast, even a simple-looking operator like / on £? is not compact, because (as we've already previously 


demonstrated, looking at the standard basis vectors) the closed unit ball is not compact. And this argument 


works to show that the identity is never compact for an infinite-dimensional Hilbert space. 


Theorem 210 
Let H be a separable Hilbert space. Then a bounded linear operator T € B(H) is a compact operator if and only 
if there exist a sequence {7p}, of finite rank operators such that ||7 — 7,|| + 0. (In other words, the set of 


compact operators is the closure of the set of finite rank operators R(H).) 


Proof. First, suppose T is compact. Since H is separable, it has an orthonormal basis, and by compactness, 
{Tu: ||u|| < 1} is compact, meaning that it is closed, bounded, and has equi-small tails. In particular, for every 


€ > 0, there exists some N € N such that 


soli ae 


k>N 


for all u satisfying ||u|| <1. We can thus define the partial sums 


n 


Th = Sra Cx) ek | 
k=1 
this is a bounded linear operator because ||T,u||* < ||u||* by Bessel’s inequality, and the range of T,, is contained within 
the span of {e,,--- ,e,}, so T, is a finite rank operator for each n. It suffices to show that this choice of T, does 
converge to T as n— oo: indeed, for any € > 0, we can let N be as above in the equi-small tails condition. Then we 


have, for any ||u|| = 1, that 


n loe} 2 
\|Tau — Tull? = |]So(Tu, exder — $2 (Tur exdex|| 
k=1 k=1 


and combining terms and using orthonormality yields 


baer Cx) Ck 


k>n 


2 
=> \(Tu, ex)/? < So (Tu, ex)? < e?. 


k>n k>N 


Taking the supremum over all u with ||u|]| = 1 and then taking a square root yields ||7, — T|| < €, as desired. 

For the opposite direction, we will use our second characterization of compact sets (approximating using finite- 
dimensional subspaces). Suppose we know that ||7, — 7T|| — 0, where each T,, is a finite rank operator. Then 
{Tu |lul| < 1} is closed, and because it is contained in the set {v: ||v|| < ||7||}, it is bounded. 

It suffices to show that for all € > 0, there exists a finite-dimensional subspace W such that for all u with ||u|| < 1, 
inf wew || u—w|| <e. The idea is to approximate T with T,: there is some N such that ||Ty — T|| < €, and thus we 


can let W be the range of Ty (which is finite-dimensional). We then have, for any ||u|| < 1, 


[Tu — Tul] < || — Trl [lel] < IIT - Tall <e, 


and thus infwew ||Tu — w|| < € because Tyu is an element of W. This means T is compact. 


We can also go a bit into the algebraic structure for compact operators, much like we did for finite rank operators: 
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Theorem 211 
Let H be a separable Hilbert space, and let K(H) be the set of compact operators on H. Then we have the 


following: 


1. K(H) is a closed subspace of B(H). 


2. For any T € K(H), we also have T* € K(H). 


3. For any T € K(H) and A, B € B(H), we have ATB € K(H). 


In other words, the set of compact operators is also a star-closed, two-sided ideal in the algebra of bounded linear 


operators. 


Proof. Point (1) is clear because K(H) is the closure of R(H) (from above). For (2), notice that if T € K(H), then 
there exists a sequence of finite-rank operators with ||7, — 7 || + 0, meaning that ||7;* — T*|| — 0 (since the operator 
norm of the adjoint and the original operator are the same). Since 7; is finite rank for each n, this means T* is indeed 
a compact operator. 

Finally, for (3), we also assume we have a sequence 7, — T of finite rank operators. Since we've already shown 
that condition (3) is satisfied by finite rank operators, we have AT,,B a finite rank operator for each n, and thus 
||AT,B — AT B\| = ||A(T, — T)BI| < |IAIl - 117m — TI] - || B] — 0 (because the operator norms of A and B are finite). 
Thus AT,,B is a sequence of finite rank operators converging to AT B, and thus AT B is compact. 


We'll now turn to studying particular properties of our operators: some of the most important numbers we associate 
with matrices are the eigenvalues. In physics, the eigenvalues of the Hamiltonian operator (which may not be finite 
rank) give us the energy levels of the system, and we'll explain formally how we make that definition now, making a 


generalization of what we encounter in linear algebra. 


Proposition 212 


Let T € B(H) be a bounded linear operator. If ||7|| <1, then / —T is invertible, and we can compute its inverse 


to be the absolutely summable series 


(-Ty t= Ss ie 
n=0 


We did this proof ourselves, and we can also use it to prove this next result: 


Proposition 213 


The space of invertible linear operators GL(H) = {T € B(H) : T invertible} is an open subset of B(H). 


Proof. Let To € GL(H). Then we claim that any operator T satisfying ||T — To|| < ||To?||7? is invertible. Indeed, 
ITo*(T — To) SNTo ll WIT = Toll <1, 


so | — i — To) is invertible by Proposition 212, meaning that / — Tat +/= ded is invertible, meaning that 
T is invertible as well. Thus 7o has an open neighborhood completely contained in GL(H), meaning that GL(H) is 


open. 
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The reason we're talking about invertible linear operators here is that symmetric, real-valued matrices can be 
diagonalized, and we find those diagonal entries (eigenvalues) by trying to study the nullspace of A— X/. So eigenvalues 


are basically impediments to the invertibility of A— A/, and that’s how we'll define our spectrum here: 


Definition 214 
Let A € B(H) be a bounded linear operator. The resolvent set of A, denoted Res(A), istheset {A €C: A-Al e€ 


GL(H)}, and the spectrum of A, denoted Spec(A) is the complement C \ Res(A). 


Notice that if A— A/ € GL(H), then we can always uniquely solve the equation (A— A/)u = v for any v € H. We 


will often write A — A/ as A — X for convenience. 


Example 215 


. Then A-A= 
2 0 Ao - A 


is not invertible exactly when A = Aj, Ao, so the spectrum of A is {Ai, Ao} (and Res(A) = C \ {Ai, Ao}). 


a) 2 neha Ar 0 ALA 0) 
Let A: C* + C* be the linear operator given in matrix form as A = 


In other words, the spectrum behaves as we expect it to for finite-dimensional operators, but there is an extra 


wrinkle for infinite-dimensional operators which we'll see soon. 


Definition 216 
If A € B(H) and A -— X Is not injective, then there exists some u € H \ {0} with Au = Xu, and we call > an 


eigenvalue of A and u the associated eigenvector. 


Example 217 


If we return to our compact operator T : 2? > £* sending at (4,2, %,---), then the nth basis vector e, is an 


eigenvector of T with eigenvalue 1, so the spectrum contains at least the set {+ :neN}. 


But there's also an additional eigenvalue for 7 that we missed in this argument: it turns out that 0 Is also in the 
spectrum, despite there being no nonzero vectors satisfying Tv = 0. This is because while the operator T —0 = T 
is indeed injective, it is not surjective and thus not invertible — in particular, the inverse of T would need to map 
a+ (a1, 242, 3a3,---), and this is not a bounded linear operator. So 0 is not in the resolvent, and thus it is in the 
spectrum, and this is an additional complication because of the infinite-dimensional structure. (The root of what's 
going on is that the range of T can be dense but not closed.) 


Also in contrast to the finite-dimensional case, it’s also possible for an operator to have no eigenvalues at all: 


Example 218 
Let T : L?([0, 1]) ~ L?([0,1]) be defined via Tf(x) = xf(x). Then T has no eigenvalues, but the spectrum is 


Spec(T) = [0, 1] (so again we see a discrepancy between eigenvalues and the spectrum). 


Theorem 219 
Let A € B(H). Then Spec(A) is a closed subset of C, and Spec(A) C {A EC: |A] < ||Al]}- 
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In particular, this means the spectrum is a compact subset of the complex numbers — this is another way we can 


understand that if {2 : n € N} ts in our spectrum, the limit point 0 must also be. 


Proof. |t is equivalent to show that the resolvent set Res(A) is open and contains the set {A € C: |A| > ||Al|}. To 
show openness, let A> € Res(A), meaning that A — Xo is invertible. Since GL(H) is open, there exists some € > 0 
such that ||T — (A—Ao)|| << € = > T € GL(H). Now because 


|A— Ao] << € => ||Al—Aol|| << e =— > ||(A-A)-(A-Ao)ll <e, 


this means that for all > in an e-neighborhood of Ao, we have A € Res(A), and that shows openness. 
We now want to show that if |A| > ||Al], then A— 4% is invertible. Indeed, for any |A| > ||Al] (in particular » is 


nonzero), we have | — tA invertible by Proposition 212. Thus 


1 
A-A= a(i 5A) 


is indeed invertible, and thus A € Res(A) as desired. O 


Remark 220. /t makes sense to ask whether the spectrum can be empty, and the answer is no. This requires some 
complex analysis — if the spectrum were empty, then for all u, v € H, f(A) = ((A—A)~1uy, v) is a continuous, complex 
differentiable function in onC. As 2 gets large, the operator norm of (A—)~* goes to 0, but now Liouville’s theorem 
tells us that because f(X) > 0 as |A| + oo, our function must be identically zero, which means that (A— »)~! = 0, 


a contradiction. 


For our purposes going forward, though, we'll focus on self-adjoint operators, and it'll be useful to have a better 


characterization of them. 


Theorem 221 
If we have a self-adjoint operator A € B(H), meaning that A = A*, then (Au, u) is real for all u, and ||A|| = 


SUPjjyj[=1 |(Au, u)|. 


Proof. The first fact is easy to show: notice that 


(Au, u) = (u, Au) = (u, A*u) = (Au, u) 


using the definition of the inner product and the adjoint. For the second fact, let a = supy,jj=1 |(Au, u)|. For all 


|u|| = 1, we have (by Cauchy-Schwarz) 
|(Au, u)| < ||Aul] - full < IA. 


So taking a supremum over all u, we find that a is a finite number, and a < ||A||. To finish, it suffices to prove the 
other inequality. For any u € H satisfying ||u|| = 1 such that Au 4 0 (there is some u for which this is true, otherwise 


A is the zero operator and the result is clear), we can define the unit-length vector v = ITI and 


(Au, Au) 


Ae) =a 
||Aul| 


= (Au, v) = Re(Au, v), 


and we can verify ourselves that this can be written as 


= FRe[(Alu tv),u+v) —(A(u—v),u—v) +/((A(u + iv), ut+ iv) — (A(u— iv), u—iv))). 


99 


Now the / ((A(u + iv), u+ iv) — (A(u — iv), u—/v)) part is purely imaginary, since (A(u+ iv), u+/v) are real by the 


first part of this result, and thus those two terms drop out when we take the real part. We're left with 


1 


= 4 ACU tv),u+v) —(A(u-—v),u-—v)), 
and now using the fact that (Au, u) < a for any unit-length u, meaning that (Au, u) < al{ul|* for all u, we can bound 
this as 
1 
< 7 (allu4 vl? + allu — v|l?), 
and by the parallelogram law this simplifies to (because ||u|| = ||v|| = 1) 


a 
= 37 2(llull? + [lvll?) = a. 


Thus ||Au|| < a for all u, meaning that ||A]| < a as desired. 


Remark 222. /n quantum mechanics, observables (like position, momentum, and so on) are modeled by self-adjoint 
unbounded operators, and the point is that all things measured in nature (the associated eigenvalues) are real. So 


there are applications of all of our discussions here to physics! 


We'll discuss more about the spectrum of self-adjoint operators next time, seeing that it must be contained in R 


and also within certain bounds involving (Au, u). 


21 May 6, 2021 


We'll continue discussing properties of the spectrum of a bounded linear operator today: recall that the resolvent of 
an operator A is the set of complex numbers » such that A — » is an element of GL(H) (in other words, A — X is 
bijective, meaning it has a bounded inverse), and the spectrum of A is the complement of the resolvent in C. While 
the spectrum is just the set of eigenvalues for matrices in a finite-dimensional vector space, there’s a more subtle 
distinction to be made now: we define » € Spec(A) to be an eigenvalue if there is some vector u with (A—A)u = 0, 
so d Is in the spectrum because A— ) is not injective. But there are other reasons for why A might be in the spectrum 
as well, for instance If the image is not closed. 

Last time, we proved that the spectrum is closed and is contained within the ball of radius ||A||, meaning that it 
is compact. We then focused our attention on self-adjoint operators, and that’s where we'll be directing our study 
today. We proved last lecture that a self-adjoint bounded linear operator A always has (Au, u) real, and that it satisfies 


||A]] = supy,)j1 |(Au, u)|. Here’s our next result: 


Theorem 223 
Let A = A* € B(H) be a self-adjoint operator. Then the spectrum Spec(A) ¢ [—||Al], ||A]|] is contained within a 


line segment on the real line, and at least one of +]||A]| is in Spec(A). 


Proof. First, we'll show the first property (that the spectrum is contained within this given line segment). We know 
from last time that Spec(A) C {|A] < ||A]|}, so we just need to show that Spec(A) C R (in other words, any complex 


number with a nonzero imaginary part is in the resolvent). Write A = s + /t for s, t real and t 4 0, so that 


A-d=(A-s)-it=A- it, 
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where A = A ~— s is another self-adjoint bounded linear operator because (A — s/)* = A* — (s/)* = A—sI. So it 
suffices to show that A — /t is bijective, and we'll switch our notation back to using A instead of A. 


Note that because (Au, u) is real, 
Im(((A — it)u, u)) = Im((—/tu, v)) = —tlIul|?, 


so (A — it)u =0 only if u = 0 (since that’s the only instance where the right-hand side is zero). Therefore, A — /t is 
injective, and we just need to show that it is surjective. Notice that (A — /t)* = A+ it is also injective by the same 
argument, so 

Range(A — it)+ = Null((A — it)*) = {0}. 


And now we can use what we know about orthogonal complements: 
Range(A — it) = (Range(A — it)+)+ = {0}+ =H, 


so it suffices to show that the range of A — /t is closed. To show that, suppose we have a sequence of elements Up, 


such that (A — /t)U, + v; we want to show that v € Range(A — /t). We know from the calculation above that 
|t] - |[Un — Um||? = |lm({(A — it)(Un — Um), Un — Um))| < [((A — it)(Un — Um), Un — Um) |, 
and by Cauchy-Schwarz this is bounded by 
S ||(A— it) Un — (A= it) uml] - [Un = uml. 


Simplifying the first and last expressions, we find that 


1 
|¢| 


||Un — Uml| S AIA — ft)Un — (A— it) Um]. 


Since t is a fixed constant, and our sequence {(A — /t)u,} converges, it is also Cauchy. In particular, for any € > 0, 
we can find some N so that the right-hand side is smaller than € as long as n,m > N, and that same N shows that 
our sequence {up} is also Cauchy. Therefore, there exists some u € H so that u, > u by completeness of our Hilbert 


space, and now we're done: since (A — /t) is a bounded and thus continuous linear operator, 
(A — it)u = lim (A — it)up = v. 
n—-oo 


So the range is closed, and combining this with our previous work, A — it is surjective. This finishes our proof that 
A — it is bijective and thus complex numbers with nonzero imaginary part are in the resolvent. 

Now for the second property, since we have shown that ||A]| = supy,,-1 |(Au, u)|, there must be a sequence of 
unit vectors {u,} such that |(Aun, Un)| > ||A||. Since each term in this sequence is real, there must be a subsequence 


of these {u,} with (Aun, Un) converging to ||A|| or to —||A||, which means that we have 


(AF IAI] un, Un) + 0 


as n + oo (this notation means one of — or +, depending on whether we had convergence to ||Al| or —||A||). We 


claim that this means A = ||A|| is not invertible: assume for the sake of contradiction that it were invertible. Then 


1 = ||unl| = (A+ IAI) "(A $ IAI nll S (A= AID Ul - IA AID eal, 


but the right-hand side converges to 0 as n — oo, contradiction. So A+ ||Al| is not bijective, and thus one of +]|A| 


must be in the spectrum of A, finishing the proof. 
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We can in fact strengthen this bound even more: 


Theorem 224 
If A = A*® € B(H) is a self-adjoint bounded linear operator, and we define a_ = inf\,)=1(Au,u) and ay = 


SUP})y|=1(Au, u), then ay are both contained in Spec(A), which is contained within [a_, a4]. 


Proof. Applying a similar strategy as before, we know that because —||A|| < (Au, u) < ||Al| for all u, we must have 


—||A]| < a. < ay < ||Al| (by taking the infimum and supremum of the middle quantity). Now by the definition of 


a_,a,, there exist two sequences {u=} of unit vectors so that (Au, u*) — az. And the argument we just gave 


works here very similarly: since we know that 


((A-— az)u;,u;) > 0, 


this implies that ay, and a_ are both in the spectrum because we have convergence to both points. 
It remains to show that the spectrum is contained within [a_,a,]. Let b = fee be their midpoint, and let 


B=A-— bl. Since b is a real number, B is also a bounded self-adjoint operator, so by Theorem 223, we know that 
Spec(B) c [—||B]], ||BIl]. 


This means that (shifting by b/) 


Spec(A) C [-||B]| + 6, ||B| + 4], 
and we can finish by noticing that 


I|Bl| = sup |(Bu,u)| = sup |(Au, uy — 22> 
Ilull=2 ilul|=1 Z 


Since (Au, u) always lies in the line segment [a_, a+] (getting arbitrarily close to the endpoints), and +42 is their 


midpoint, this supremum will be half the length of that line segment, meaning that 


ated 
Bl = SA = Spec(A) ¢ [IIB + 5.118] + 4] = [a a4l, 


as desired, completing the proof. 


Corollary 225 


Let A* = Ae B(H) be a self-adjoint linear operator. Then (Au, uv) > 0 for all u if and only if Spec(A) C [0, 00). 


(This can be shown by basically walking through the logic for what a_ needs to be under either of these conditions. ) 
We'll now move on to the spectral theory for self-adjoint compact operators: the short answer is that we essentially 
see Just the eigenvalues, with the exception of zero being a possible accumulation point. And in particular, the spectrum 
will be countable, and this should make sense because compact operators are the limit of finite rank operators — we 


don't expect to end up with wildly different behavior in the limit. 


Definition 226 
Let A € B(H) be a bounded linear operator. We denote F, to be the nullspace of A — X, or equivalently the set 


of eigenvectors {u € H: (A—A)u =O}. 
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Theorem 227 
Suppose A* = A € B(H) is a compact self-adjoint operator. Then we have the following: 


1. If X £0 is an eigenvalue of A, then A € R and dim Ey is finite. 


2. If A1 # Ae are eigenvalues of A, then F,, and Ey, are orthogonal to each other (every element in Fy, is 


orthogonal to every element in F),). 


. The set of nonzero eigenvalues of A is either finite or countably infinite, and if it is countably infinite and 


given by a sequence {An}n, then |A,| > 0. 


Proof. For (1), let \ be a nonzero eigenvalue. Suppose for the sake of contradiction that E) is infinite-dimensional. 
Then by the Gram-Schmidt process, there exists a countable collection {u,}, of orthonormal elements of Fy. Since 
A is a compact operator, this means that {Au;}, must have a convergent subsequence, and in particular that means 


we have a Cauchy sequence {Aup,};. But we can calculate 
[| Aun, — Aung lI? = |]Aun, — Aung LI? = IAP? | Un, = Ung ll? = 21A1?, 


so the distance between elements of the sequence does not go to O for large j,k, a contradiction. Thus E) Is finite- 
dimensional. To show that A must be real, notice that we can pick a unit-length eigenvector u satisfying Au = Au, 


and then we have 
= A(u, U) = (Au, U) = (Au, u), 


and we've already shown that this last inner product must be real, so Is real. 


For (2), suppose Az # Ao, and supppose uy € Fy,, uo € Ey,. Then 
Ai (U1, U2) = (Ai U1, U2) = (Ady, Ue), 
and now because A Is self-adjoint, this is 
= (uy, Alo) = (uy, Agua) = A2(U1, U2) 


(no complex conjugate because eigenvalues are real). Therefore, we must have (Az — Az) (ui, U2) = 0, so (because 
1 — A2 #0) (uy, U2) = 0 and we've shown the desired orthogonality. 

Finally, for (3), let A= {A #0: A eigenvalue of A}. We need to show that A is either finite or countably infinite, 
and we claim that we can actually prove both parts of (3) simultaneously by showing that if {An}, is a sequence of 


distinct eigenvalues of A, then A, > 0. This is because the set 
1 
An ={A EA: Al 2 Fa} 


is a finite set for each N (otherwise we could take any sequence of distinct elements in Ay, and that can’t converge 
to 0), and thus A = Unen Aw is a countable union of finite sets and thus countable. 


In order to prove this claim, let {up}, be the associated unit-length eigenvectors of our eigenvalues Aj. Then 
[An| = lAnUnll = [Auall, 


so we further reduce the problem to showing that ||Aun|| > 0. But showing this is a consequence of us having an 
orthonormal sequence of vectors and A being compact: suppose that ||Au,|| does not converge to 0. Then there 


exists some €9 > 0 and a subsequence {Aur,,} so that for all , ||Aun,;|| = €o. Then because A is a compact operator, 
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there exists a further convergent subsequence ex = Uny + meaning that {Aex}, converges in H. 
Since ex, and eg are eigenvectors that correspond to distinct eigenvalues, they are orthogonal, and therefore Ae, 


and Ae are also orthogonal. But now if f = lim soo Aex, then 
[Fl] = lim ||Aex|| 2 €0, 
k-0o 
meaning that by continuity of the inner product, 


€} < ||F||? = (F, f) = lim (Aex, Ff) = lim (ex, Af). 


lim 
k-o0o k—- +00 
And because the sequence (e,, Af) gives us the Fourier coefficients of Af, the sum of their squares should be finite 


(by Bessel’s inequality, it's at most ||Af]|? < oo). This contradicts the fact that the limit of the Fourier coefficients 


is at least €5. So our original assumption is wrong, and ||Au,|| must converge to 0, proving the claim. 


22 May 11, 2021 


We'll continue our discussion of spectral theory for self-adjoint compact operators today — we should recall that the 
spectrum of a bounded linear operator is a generalization of the set of eigenvalues, and it is defined as the set of XE C 
such that A — dX is not invertible. We discussed previously that for a self-adjoint operator, the spectrum is contained 
within a line segment on the real line, and in the finite-dimensional case we can choose a basis of eigenvectors in which 
the operator is diagonal. We'll prove that something similar holds in the infinite-dimensional case, as long as we have 
compact operators (which makes sense, since they're the limit of finite-rank operators). But we'll prove some other 


results along the way first, based off of some of the examples we've been presenting. 


Theorem 228 (Fredholm alternative) 
Let A = A* © B(H) be a self-adjoint compact operator, and let A € R \ {0}. Then Range(A — 4) is closed, 


meaning that 


Range(A — A) = (Range(A — A)+)+ = Null(A— A)+. 


Thus, either A — d is bijective, or the nullspace of A — X (the eigenspace corresponding to A) is nontrivial and 


finite-dimensional. 


This result basically tells us when we can solve the equality (A— A)u = f: we can do so if and only if f is orthogonal 
to the nullspace of A— ». The finite-dimensional part of this theorem comes from Theorem 227 — it is useful because 
we can check orthogonality by taking a finite basis of A — A’s nullspace. 


A further consequence here is that because the spectrum of a self-adjoint A is a subset of the reals, we have 
Spec(A) \ {0} = {eigenvalues of A}, 


since the nonzero spectrum only fails to be bijective because we have an eigenvector. And because the eigenvalue set 


is finite or countably infinite, tt can only be countably infinite if those eigenvalues converge to zero. 


Proof. We need to show that the range of A— is closed if X 4 0. Suppose we have a sequence of elements (A— A) up 
that converge to f € H, and we need to show that f Is also in the range of A— X. 

It is not true that the u,s will necessarily converge, but we'll find a way to extract a relevant subsequence. We can 
first define 


Vn = Anu(a—a)t Un 
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the projection onto the orthogonal complement of Null(A — A). Then we can use the direct sum decomposition of 


vectors into Null(A — X) and its orthogonal complement, and we find that 
(A—A)uUn = (A- A) (Mnunca—a) Un + Vn) = (A—A)vn. 


So we can take away some noise and just consider a sequence (A — A)v, — f, where vy all live in an orthogonal 
subspace to Null(A — A). 
We now claim that {v,} is bounded — suppose otherwise. Then there exists some {Vpj,} such that ||Vp,|| + 00 as 


J 2 0, So 
Vnj 1 


IIvnjlF UV 


as J — oo, using the definition of our sequences and the fact that the norm diverges. Because A Is a compact operator, 


(A— ) (A—)vn, > OF = 0 


there now exists some further subsequence, which we'll denote {v,, }, such that {Are } converges. But because 
"k 


Vn, 1 ( Vn, Ving ) 
== (0A (A— 2) 
[Vogl] AK [Mn [Vag I 


and the second term on the right converges to O and the first converges based on our choice of subsequence, we 


Ving 


Vig | 
Null(A — A)+ (because said set is closed and our definition of v,;s means that all terms are indeed in Null(A — A)+). 


find that the sequence of terms on the left-hand side, { }, must converge to some element v which is also in 


S + 3 + e Vv, 
This gives us a contradiction, because ||v|| = limk—oo py q7 = 1, and 
Nk 


Vix 


[| Yng I 


(A—A)v = lim (A— A) =0 


by the choice of our further subsequence. Putting this all together, v is both in the nullspace of A — A and also its 
orthogonal complement, so v = 0, contradicting the fact that ||v|| = 1. Thus our sequence {v,} must be bounded. 

So now returning to what we wanted to prove, because {v,} is bounded and A is a compact operator, {(A—A)vn} 
is also bounded, and thus there exists a subsequence {Vp} (a completely different subsequence from before) so that 
{Avn;} converges. (The definition of compactness tells us facts about the unit ball, but we can always scale to a unit 
ball of any finite radius.) And now by the same trick as before, 


1 
Vnj = x 


(Avn, — (A—A)vn,) 


has both terms on the right converging, SO v;,, > v for some v € H. And now we know that (A — A)v, converges to 


f, so because convergence still holds when we restrict to a subsequence, we have 
f = lim (A—A)vn, = (A-A)v 
Joo 


(since A — X is a bounded and thus continuous linear operator), and we're done because f is now in the range of 
Ax. 


Remark 229. We did not actually use the fact that A Is a self-adjoint operator in this argument — the fact that 
Range(A — 2) is closed is still true if A is just a compact operator, but the consequences of that fact only apply for 


self-adjoint operators. 


We've also shown previously that one of +||A|| must be in the spectrum, and that gives us this next result: 
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Theorem 230 


Let A = A* be a nontrivial compact self-adjoint operator. Then A has a nontrivial eigenvalue A, with |A,| = 


SUP}jyjj=1 [(Au, u)| = |(Auy, ur)|, where uy is a normalized eigenvector (with ||u;|| = 1) satisfying Au; = Az U1. 


Proof. Since at least one of +||A| are in Spec(A) (and ||A|| 4 0 because we have a nontrivial operator), at least one 
of them will be an eigenvalue of A by the Fredholm alternative, and we'll let this be Az. The equation for A; follows 
from the fact that we generally have 

|A|| = sup |(Au, up|, 


||u||=1 


and the equality with |(Au,, u,)| comes from the fact that being an eigenvalue implies that we have an eigenvector. 


We'll now keep going — It turns out we can keep building up eigenvalues in this way, because of the fact that 
eigenvectors of different eigenvalues are orthogonal. This will lead us to constructing an orthonormal basis in the way 


that we alluded to at the beginning of class. 


Theorem 231 (Maximum principle) 

Let A = A* be a self-adjoint compact operator. Then the nonzero eigenvalues of A can be ordered as |Ai| > 

|A2| >--- (including multiplicity), such that we have pairwise orthonormal eigenfunctions {ux} for Ax, satisfying 
Evil sup | (Au, u)| = |(Auj, u)I. 


|[u||=1 
u€Span(uy,-- ,uj—1)+ 


Furthermore, we have |A;| + 0 as j — oo if the sequence of nonzero eigenvalues does not terminate. 


In other words, after we find Az (which will be the eigenvalue with largest magnitude) through our previous result, 
we can look at the orthogonal complement to all of the eigenvectors so far and get the eigenvector of next largest 


magnitude, and we can keep repeating this process. 


Proof. We already know that we have countably many eigenvalues and that each one has a finite-dimensional 
eigenspace, so the fact that they can be ordered is not new information. And the fact that |A;| — 0 has already 
previously been proved in a previous lecture as well (for the case of distinct eigenvalues, but it still holds when each 
eigenvalue has finite multiplicity), so the only new result is the equation for computing |Aj|. 

We will show that the equation holds by constructing our eigenvalues inductively. First of all, we can construct 
1 and uy using our previous theorem (finding an eigenvalue of largest magnitude and its corresponding eigenvector), 
so the base case Is satisfied. For the inductive step, suppose that we have found Aj,--- , An, along with orthonormal 
eigenvectors Uy,--- , Up, Satisfying the equation for |A;| in the maximum principle. We now have two cases: in the first 


case, we have 
n 


Au = > Ak (U, Uk) Uk, 


k=1 
so we've found all of the eigenvalues and the process terminates (because A is a finite-rank operator). But in the 
other case, A is not finite-rank and the equality above doesn’t hold. So if we want to find An+41, we can define a linear 


operator A, (which is not identically zero) via 


n 
Anu = Au — S- Aku, Uk) Uk. 
k=1 
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We can check that A, is a self-adjoint compact operator (because A is self-adjoint and the A, are real numbers, and 
A, is a sum of a compact operator A and a finite-rank operator). So if u € Span{uy,--+- , Un}, then A,u = 0 (because 
orthogonality of the eigenvectors so far gives us Anu; = 0 for all j € {1,--- , n} and then we can use linearity to extend 
to the span). Furthermore, for any u € Span{u,..., ii, we have A,u = Au because the sum term drops out. 


Therefore, for any u € H and any v € Span{uy,--- , Un}, we have 
(Anu, V) = (u, Anv) = 0 


(first step because A, is self-adjoint and second step from our work above). Another way to say this is that Aju is 


always in the orthogonal complement of Span{u1,--+ , Un}, so 
Range(A,) C Span{uy,--+ , Un}t. 


From this fact, we learn that if A,u = Au for some nonzero Au, then u = A, (¢) is in the range of Ap, so it is in 
Span{uy,--- Sti: From our work above, this means that A,u = Au = Au, so any nonzero eigenvalue of A, is also 
a nonzero eigenvalue of A. We can therefore apply our previous theorem to see that A, has a nonzero eigenvalue 
An+1 with unit eigenvector Ups, (orthogonal to the span of {uy,--- , Un} because A, is zero on that span), with 
lAnsi] = SUP}y\j=1 (Anu, u)|. Since we're still working in the same Hilbert space, this expression can be written in 


terms of A as well. First, we note that 


|An+1| = sup (Anu, u)|, 


u€Span{ uy, ,Un}t 


since Anu is zero on Span{uy,--- , Up} anyway, and then when we restrict to those u, we have A,u = Au, so this is 
= sup \(Au, u)|, 
|u||=1 


u€Span{ uy,-** ,Un}t 


which gives us the desired equation. We also preserve ordering of eigenvalues because because 


Anti] = sup (Au, u)| < sup |(Au, u)| = |Anl.- 
I|u||=1 |Ju||=1 
u€Span{ uy, ,un}+ u€Span{ur + ,Un—1}+ 
Finally, because |An+i] = |(Aup41, Un+i1)|, we've shown all of the results above and finished the proof. 


Theorem 232 (Spectral theorem) 
Let A = A* be a self-adjoint compact operator on a separable Hilbert space H. If |Ai| > |Ao| > --- are the 


nonzero eigenvalues of A, counted with multiplicity and with corresponding orthonormal eigenvectors {uz}, then 


{Ux} is an orthonormal basis for Range(A) and also of Range(A), and there is an orthonormal basis {f}; of 
Null(A) so that {ux}« U {A}; form an orthonormal basis of H. 


In other words, we can find an orthonormal basis consisting entirely of eigenvectors for our self-adjoint compact 


operator (since the nullspace corresponds to eigenvectors of eigenvalue 0). 


Proof. First, note that the process described in the proof of the maximum principle terminates if and only if A is finite 
rank, meaning that there is some n with Au = Ss Ax (U, Ux) Uz. In such a case, Range(A) C Span{uy,--- , ux}, and 


thus {ux} do indeed form an orthonormal basis for Range(A) and also of Range(A). 
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Otherwise, the process does not terminate, and thus we have countably infinitely many nonzero eigenvalues {Ax }2°4, 
counted with multiplicity. We know that |A,| — 0, and we also know that the uxs form an orthonormal subset of 
Range(A). To show it is a basis, we must show that if f € Range(A) and (f, ux) = 0 for all k, then f = 0. 

To do that, we first write f = Au for some u € H, meaning that (Au, ux) = 0 for all k. Since A is self-adjoint, 


this means (because A, is real) that 
Ak (U, Uk) = (U, AKUK) = (Au, Ux) = 0. 


Therefore u is orthogonal to all uz, so by the maximum principle, 


IIFl| = ||Aull = | (4- So Aku, wu] u 
k=1 


(because each term in the sum is zero), meaning that we can rewrite this as 


= ||Anul| < [Anal - [lull 


Taking n > co and noting that the As converge to 0, we must have ||f|| = 0. This proves that the eigenvectors indeed 
form an orthonormal basis for the range of A. To show that they also form an orthonormal basis for the closure of 


that range, notice that 


Range(A) ¢ Span{ ux}, 


and now remembering that the span is the set of finite linear combinations of the u,s, but the closure of that can be 


= {Sau : py le ~| 


Therefore, {ux} must indeed be an orthonormal basis for Range(A). We finish by noting that this means we have an 


written as 


orthonormal basis of 
Range(A) = (Range(A)+)+ = (Null(A))+, 


so to complete the orthonormal basis of H, we just need an orthonormal basis of Null(A), which exists because H is 


separable and thus Null(A) is also separable. 


We'll see an application of this to differential equations and functional calculus next time! 


23 May 13, 2021 


In this last lecture, we'll apply functional analysis to the Dirichlet problem (understanding ODEs with conditions at the 
boundary). In an introductory differential equations class, we often state initial conditions by specifying the value and 


derivatives of a function at a given point, but what we're doing here is slightly different: 


Problem 233 (Dirichlet problem) 


Let V € C((0, 1]) be a continuous, real-valued function. We wish to solve the differential equation 


—u"(x) + V(x)u(x) = f(x) Vx € [0,1], 
u(0) = u(1) = 0. 
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We can think of this as specifying a “force” f € C([0,1]) and seeing whether there exists a unique solution 
u € C2([0, 1]) to the differential equation above. It turns out the answer is always yes when V > 0, and that’s what 


we'll show today. 


Theorem 234 


Let V>0. If f © C(O, 1]) and uw, ue € C?([0, 1]) both satisfy the Dirichlet problem, then uy = Uo. 


Proof. If u = uy — uz, then u € C?([0, 1]), and we have a solution to 


—u"(x) + V(x)u(x) =0 Vx € [0, 1], 
u(0) = u(1) = 0. 


We now note that it is true that 7 
0= i (—u"(x) + V(x)u(x)) u(x) dx 
0 
because the integrand is always zero, and now we can split up this integral into 
1 1 
0= -| u (x)u(x)dx +/ V(x)|u(x)|? dx. 
0 0 
Integration by parts on the first term gives 
ees! 1 ae al 
0= -u' up|, + f uid + | V(x)|u(x)|P dx. 
0 0 


The first term now vanishes by our Dirichlet boundary conditions, and we're left with 
1 1 
0= | |u’ (x) |? dx +f V(x)|u(x)|? dx. 
0 0 


Since V is nonnegative, the second term is always nonnegative, and thus 0 > ie |u'(x)|? > 0, and we can only have 


equality if u’(x) = 0 everywhere (since we have a continuous function). This combined with the Dirichlet boundary 


conditions implies that u = 0, so uy = Ub. 


Showing existence is more involved, and we'll start by doing an easier case, specifically the one where V = 0. It 


turns out that we can write down the solution explicitly using a self-adjoint compact operator: 


Theorem 235 
Define the continuous function K(x, y) € C([0, 1] x [0, 1]) via 
Gly Ha ysx a1 


K(x, y) = 
(y= 1)x Oa xe y< i. 


Then if Af(x) = tik K(x, y)f(y)dy, then A € B(L?([0, 1]) is a compact self-adjoint operator, and Af solves the 
Dirichlet problem with V = 0 (meaning that u = Af is the unique solution to —u"(x) = f(x), u(0) = u(1) = 0). 


(The fact that the solution can be written in terms of an integral operator may not be surprising, since differentiation 


and integration are inverse operations by the fundamental theorem of calculus.) 
Proof. First, we let 


C= sup |K(x y)I, 
[0,1] x [0.1] 
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which is finite because K is continuous. Then by Cauchy-Schwarz, we have 


IAr(x)| = We Kix) y)dy| < [armas ([ ’) ([ if?) 


by thinking of the integral of f as (f,1), and this shows that |Af(x)| < C]|fl|2. We can also get the bound 


1/2 


|AF(x) — Af(z)| < sup |K(x,y)— K(z,y)|- [IF lle 
ye(0,1] 


using an analogous argument. So now we can use the Arzela-Ascoli theorem (giving sufficient conditions for a sequence 
of functions to have a convergent subsequence) and conclude that A is a compact operator on L?([0, 1]) (details left 
for us). In fact, this also shows that Af € C({0, 1]). Furthermore, A is self-adjoint because for any f, g € C([0, 1]), 


we have (under the L? pairing) 


(ar.a)a= [ (f K(x. »)F(v)dy) Tea = rn K(x, y)g(x)dxdy 


by Fubini’s theorem (since we can swap the order of integration for continuous functions). We can then rewrite this 


expression as a different pairing 


-{ Fy) ( | RR vha(x)dx) dy = : Fy) ( | K(y. x)a(x)4x) dy 


where we've used the fact that K(x, y) = K(x, y) (because everything is real) and also that K(x, y) = K(y,x) by 


definition. So what we end up with is just (f, Ag). Since f, g were arbitrary continuous functions to start with, and 
C([0, 1]) is a dense subset of L?([0,1]), this means that A is self-adjoint (since the relation (Af, g) = (f, Ag) must 
hold for all f, g € L? by a density argument). 

We now need to verify that Af is a twice differentiable function that solves the Dirichlet problem with V = 0. 


Indeed, we write out ; 
ux) = P(x) = =) f vray +x fy —aFv)ay, 


and by the fundamental theorem of calculus (just a computation) we can indeed verify u € C?([0, 1]) with —u” = f. 


Uniqueness follows from Theorem 234 above. 


We thus have an explicit solution for V = 0, and to solve the Dirichlet problem in general for V 4 0, we will think 
about —u" +Vu=f via 
—u" =f -Vu = u=A(f -Vu) 


by thinking of the right-hand side f — Vu as a fixed function of x and using the result we just proved. Therefore, 
(1+ AV)u = Af, 


and we've now gotten rid of differentiation and are just solving an equation in terms of bounded operators, though 
AV is not generally self-adjoint because (AV)* = VA. We can get around this issue, though: if we write u = A!/2v 


(defining A'/2 to be some operator such that applying it twice gives us A), then our equation becomes 
AV2(1 + AMAVAN2)y = Af => 14+ (AMAVAY2)u = AMF, 


and we do indeed have self-adjoint operators here because (A!/2VAl/?)* = Al/2VA1/?, and from there, we can use 
the Fredholm alternative. Of course, everything here is not fully justified, but that’s what we'll be more careful about 


now: 
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Theorem 236 
We have Null(A) = {0}, and the orthonormal eigenvectors for A are given by 


u(x) = V2sin(knx), k EN, 


with associated eigenvalues Ax = vee 


Remark 237. As a corollary, the spectral theorem (from last lecture) then tells us that {/2sin(kmx)} gives us 
an orthonormal basis of L?([0,1]), which is a result we can also prove by rescaling our Fourier series result from 
L?([-7, 7]). 


Proof. First, we'll show that the nullspace of A is trivial by showing that the range of A is dense in L?. Indeed, if u 
is a polynomial in [0,1] with f = —u” and u(0) = u(1) = 0, then Af is the unique solution to the Dirichlet problem 
with V = 0, meaning that —(Af)” = f and Af(0) = Af(1) = 0. Therefore Af = u, and therefore any polynomial 
vanishing at x = 0 and x = 1 is in the range of A. Since the polynomials vanishing at {0,1} are dense in the set of 
continuous functions vanishing at {0,1} (using the Weierstrass approximation theorem), and that set is dense in L?, 
we have indeed shown that the range of A is dense in L? as desired. From here, notice that Range(A) = Null(A)+, so 
if the left-hand side is H, then the right-hand side must have Null(A) = {0}. 

To show the statement about eigenvectors, suppose we have some eigenvalue A # O and normalized eigenvector 
u such that Au = Xu. Then because (as discussed before) the function Af is always continuous by our bound 
on |Af(x) — Af(z)|, Au must be twice continuously differentiable, and thus u = }Au is also twice continuously 


differentiable. So we now have (by linearity) 
u= A(<) => -u"= kay) u(0) = u(1) =0 
7 mn a 7 — 
where we're using the chain rule and the fact that —(Af)” = f. This is now a simple harmonic oscillator, meaning 


u(x) = Asin (=) + Boos (=) 


Plugging in u(O) = 0 tells us that B = 0, and plugging in u(1) = 0 tells us that a = nt for some n EN, which tells 


that the solutions take the form 


us that u(x) = Asin(kmx) for some integer k, as desired (and A = V2 by direct computation). 


Since we now have a basis in which the operator A is diagonal, we can construct A!/? by essentially taking the 


square roots of all of the eigenvalues (so that Al/?A!/? = A). 


Definition 238 
Let f € L?([0, 1]), and suppose that f(x) = 0%, cev2sin(kmx), where cy = (es f(x)V2sin(kax)dx. Then we 


define the linear operator A?/2 via 
co 


AV? F(x) = 2 ev 2sin(kx), 


k=1 


Here, the reason for the x in the definition above is that we have a wo eigenvalue that we want to produce after 
two iterations of A!/2. And it’s useful to remember that taking two derivatives of Af here recovers —f, because the 


second derivative of sin{kax) is —sin(krx). 
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Theorem 239 


The operator A‘/? is a compact, self-adjoint operator on L?([0, 1]), and (A1/?)? =A 


Proof. Suppose we have f(x) = 2724 ceW2sin(kmx) and g(x) = 3, deV2sin(kmx). First of all, 


2 


00 2 
1/2 ¢))2 ok Ick 
Ayes =>, Z sin(kax) ae 
k=1 2 k=1 
by Parseval’s identity, and we can further bound this as 
li 1 
<s c|? = s||F(I5, 
< ga Dalal? = salt 


so A'/2 is bounded. For self-adjointness, we can use the £2-pairing coming out of the Fourier expansion: 
(AY? F, g) = . oh = = ae = (f, AM). 
So we now need to show that (A1/?)? = A, and this is true because 
AV2(Ai2F) = a eh 2sin(kax) = . 53 2sin(kx), 


k=1 


and now because each term here is an eigenfunction of A, this can be written as 


= s c.A(V2 sin(kax)) = Ay (V2 sin(kax)) = Af 
k=1 k=1 


(we can move the A out of the infinite sum because the finite sum converges in £2-norm to the infinite sum, and 

because A is bounded, A applied to the finite sum converges to A applied to the infinite sum). So Al/2Al/2 — A, 
We'll finish by briefly discussing why A!/2 is compact. To do that, we can show that the image of the unit ball 

{AV/?F : ||F||2 <1} has equi-small tails (which suffices by our earlier characterizations of compactness). Indeed, for 


any € > 0, we may pick an N EN such that 7 < e. Then for any f € L?((0, 1]) with ||f||2 < 1, we have 


~, Ic.|? 7 
DAMP, V2sin(kmx))/? => ots < a5 3 lel? = aallfl Saas 8 
k>N k>n 


so Al/2 satisfies the desired conditions and is indeed compact. 


Now that we have the operator A!/?, we'll put it to good use: 


Theorem 240 
Let V € C(O, 1]) be a real-valued function, and define 


my F(x) = V(x)F(x) 


to be the multiplication operator. Then my is a bounded linear operator and self-adjoint. 


(This is left as an exercise for us.) 


112 


Theorem 241 
Let V € C([0,1]) be a real-valued function. Then T = A‘/?myA‘/? is a self-adjoint compact operator on 


L?({0,1]), and T is a bounded operator from L?([0, 1]) to C(O, 1]). 


Proof. The first part of this result follows directly from what we've already shown: since my and A?/2 are compact op- 
erators, so is the product A!/2myA!/?, and it's self-adjoint because A!/? and my are self-adjoint and (A‘/2myA'/?)* = 
All2 my AM2 (remembering to reverse the order in which the operators appear). For the remaining step, it remains to 
show that A‘/? is a bounded linear operator from L?({0,1]) to C([0, 1]): indeed for any f(x) = S722, cV2sin(kmx), 


we have = 
AV? (x) = 5 4/9 sin( kx). 
(x) Ds kr sin(k1x) 


Since | V2 sin(kmx)| < Ie and Cauchy-Schwarz tells us that this is a summable series: 
> Ick aa ie 
Ck TT 
k=1 k k 


We thus find that A‘/?f € C([0, 1]) by the Weierstrass M-test, satisfying the bound |A!/?F(x)| < ™I|F lo, and this 
shows that A!/? (and thus T) is a bounded linear operator from L?([0, 1]) to C([0,1]). Furthermore, because each 
term of the series defining Al/?f(x) evaluates to 0 at x = 0,1, we must have A‘/?#(0) = Al/?F(1) = 0 for all f. 


We now have all of the ingredients that we need to solve our problem: 


Theorem 242 
Let V € C([0,1]) be a nonnegative real-valued continuous function, and let f € C([0,1]). Then there exists a 


(unique) twice-differentiable solution u € C?([0, 1]) such that 


—u"+Vu=f Yxe [0,1], 
u(0) = u(1) = 0. 


Proof. We know that A!/2myA!/2 is a self-adjoint compact operator, so by the Fredholm alternative, | + A!/2myA?/2 


has an inverse if and only if the nullspace is trivial. Suppose that (/ + Al? my Al?) = 0 for some g € L?. Then 
0= (1+ A? myA"?)g, 9) = ||gll5 + (A? my A"/7g, 9) 


by linearity, and now we can move one of the A!/2s over by self-adjointness to get 
2 2 2 2 : 22 
0 = |I913 + (mvA'/?g, AY’?g) = |I9II3 + / VIA? glPdx. 
0 


Since V > 0, the second term is always nonnegative, meaning that we have 0 > ||g||3 > 0. This means the only way 
for this to happen is if g = 0. Thus / + A!/2myA?/? is indeed invertible. 


To finish, we define 
v=(14+AV?myAY?)-1AM2F, ou = AM2y, 


Then some manipulation yields 


u+A(Vu) = AV2v + AV2(AV2myAY?)v = AV2(1 + (AY2mvyA"Y?))v, 
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and plugging in the definition of v gives us 
ut AVu = AVA AN? F = AF. 
And this is what we want: taking two derivatives on both sides gives us 
uu’ —Vu=—-f = -u"+Vu=f, 


and thus u indeed solves the differential equation. Furthermore, the last argument in the proof of Theorem 241 tells 


us that u = A!/2v indeed satisfies the Dirichlet boundary conditions, and thus we've solved the Dirichlet problem. 
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